summaryrefslogtreecommitdiffstats
path: root/tools/perf/util
AgeCommit message (Collapse)AuthorFilesLines
2017-11-13Merge branch 'perf-core-for-linus' of ↵Linus Torvalds68-917/+2240
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "The main changes in this cycle were: Kernel: - kprobes updates: use better W^X patterns for code modifications, improve optprobes, remove jprobes. (Masami Hiramatsu, Kees Cook) - core fixes: event timekeeping (enabled/running times statistics) fixes, perf_event_read() locking fixes and cleanups, etc. (Peter Zijlstra) - Extend x86 Intel free-running PEBS support and support x86 user-register sampling in perf record and perf script. (Andi Kleen) Tooling: - Completely rework the way inline frames are handled. Instead of querying for the inline nodes on-demand in the individual tools, we now create proper callchain nodes for inlined frames. (Milian Wolff) - 'perf trace' updates (Arnaldo Carvalho de Melo) - Implement a way to print formatted output to per-event files in 'perf script' to facilitate generate flamegraphs, elliminating the need to write scripts to do that separation (yuzhoujian, Arnaldo Carvalho de Melo) - Update vendor events JSON metrics for Intel's Broadwell, Broadwell Server, Haswell, Haswell Server, IvyBridge, IvyTown, JakeTown, Sandy Bridge, Skylake, SkyLake Server - and Goldmont Plus V1 (Andi Kleen, Kan Liang) - Multithread the synthesizing of PERF_RECORD_ events for pre-existing threads in 'perf top', speeding up that phase, greatly improving the user experience in systems such as Intel's Knights Mill (Kan Liang) - Introduce the concept of weak groups in 'perf stat': try to set up a group, but if it's not schedulable fallback to not using a group. That gives us the best of both worlds: groups if they work, but still a usable fallback if they don't. E.g: (Andi Kleen) - perf sched timehist enhancements (David Ahern) - ... various other enhancements, updates, cleanups and fixes" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (139 commits) kprobes: Don't spam the build log with deprecation warnings arm/kprobes: Remove jprobe test case arm/kprobes: Fix kretprobe test to check correct counter perf srcline: Show correct function name for srcline of callchains perf srcline: Fix memory leak in addr2inlines() perf trace beauty kcmp: Beautify arguments perf trace beauty: Implement pid_fd beautifier tools include uapi: Grab a copy of linux/kcmp.h perf callchain: Fix double mapping al->addr for children without self period perf stat: Make --per-thread update shadow stats to show metrics perf stat: Move the shadow stats scale computation in perf_stat__update_shadow_stats perf tools: Add perf_data_file__write function perf tools: Add struct perf_data_file perf tools: Rename struct perf_data_file to perf_data perf script: Print information about per-event-dump files perf trace beauty prctl: Generate 'option' string table from kernel headers tools include uapi: Grab a copy of linux/prctl.h perf script: Allow creating per-event dump files perf evsel: Restore evsel->priv as a tool private area perf script: Use event_format__fprintf() ...
2017-11-13Merge branch 'locking-core-for-linus' of ↵Linus Torvalds2-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core locking updates from Ingo Molnar: "The main changes in this cycle are: - Another attempt at enabling cross-release lockdep dependency tracking (automatically part of CONFIG_PROVE_LOCKING=y), this time with better performance and fewer false positives. (Byungchul Park) - Introduce lockdep_assert_irqs_enabled()/disabled() and convert open-coded equivalents to lockdep variants. (Frederic Weisbecker) - Add down_read_killable() and use it in the VFS's iterate_dir() method. (Kirill Tkhai) - Convert remaining uses of ACCESS_ONCE() to READ_ONCE()/WRITE_ONCE(). Most of the conversion was Coccinelle driven. (Mark Rutland, Paul E. McKenney) - Get rid of lockless_dereference(), by strengthening Alpha atomics, strengthening READ_ONCE() with smp_read_barrier_depends() and thus being able to convert users of lockless_dereference() to READ_ONCE(). (Will Deacon) - Various micro-optimizations: - better PV qspinlocks (Waiman Long), - better x86 barriers (Michael S. Tsirkin) - better x86 refcounts (Kees Cook) - ... plus other fixes and enhancements. (Borislav Petkov, Juergen Gross, Miguel Bernal Marin)" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits) locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE rcu: Use lockdep to assert IRQs are disabled/enabled netpoll: Use lockdep to assert IRQs are disabled/enabled timers/posix-cpu-timers: Use lockdep to assert IRQs are disabled/enabled sched/clock, sched/cputime: Use lockdep to assert IRQs are disabled/enabled irq_work: Use lockdep to assert IRQs are disabled/enabled irq/timings: Use lockdep to assert IRQs are disabled/enabled perf/core: Use lockdep to assert IRQs are disabled/enabled x86: Use lockdep to assert IRQs are disabled/enabled smp/core: Use lockdep to assert IRQs are disabled/enabled timers/hrtimer: Use lockdep to assert IRQs are disabled/enabled timers/nohz: Use lockdep to assert IRQs are disabled/enabled workqueue: Use lockdep to assert IRQs are disabled/enabled irq/softirqs: Use lockdep to assert IRQs are disabled/enabled locking/lockdep: Add IRQs disabled/enabled assertion APIs: lockdep_assert_irqs_enabled()/disabled() locking/pvqspinlock: Implement hybrid PV queued/unfair locks locking/rwlocks: Fix comments x86/paravirt: Set up the virt_spin_lock_key after static keys get initialized block, locking/lockdep: Assign a lock_class per gendisk used for wait_for_completion() workqueue: Remove now redundant lock acquisitions wrt. workqueue flushes ...
2017-11-09perf tools: Fix eBPF event specification parsingJiri Olsa1-2/+2
Looks like I've reached the new level of stupidity, adding missing braces. Committer testing: Given the following eBPF C filter, that will add a record when it returns true, i.e. when the tv_nsec variable is > 2000ns, should be built and installed via sys_bpf(), but fails to do so before this patch: # cat filter.c #include <uapi/linux/bpf.h> #define SEC(NAME) __attribute__((section(NAME), used)) SEC("func=hrtimer_nanosleep rqtp->tv_nsec") int func(void *ctx, int err, long nsec) { return nsec > 1000; } char _license[] SEC("license") = "GPL"; int _version SEC("version") = LINUX_VERSION_CODE; # # perf trace -e nanosleep,filter.c usleep 1 invalid or unsupported event: 'filter.c' Run 'perf list' for a list of valid events Usage: perf trace [<options>] [<command>] or: perf trace [<options>] -- <command> [<options>] or: perf trace record [<options>] [<command>] or: perf trace record [<options>] -- <command> [<options>] -e, --event <event> event/syscall selector. use 'perf list' to list available events # And works again after it is applied, the nothing is inserted when the co # perf trace -e *sleep,filter.c usleep 1 0.000 ( 0.066 ms): usleep/23994 nanosleep(rqtp: 0x7ffead94a0d0) = 0 # perf trace -e *sleep,filter.c usleep 2 0.000 ( 0.008 ms): usleep/24378 nanosleep(rqtp: 0x7fffa021ba50) ... 0.008 ( ): perf_bpf_probe:func:(ffffffffb410cb30) tv_nsec=2000) 0.000 ( 0.066 ms): usleep/24378 ... [continued]: nanosleep()) = 0 # The intent of 9445464bb831 is kept: # perf stat -e 'cpu/uops_executed.core,krava/' true event syntax error: '..cuted.core,krava/' \___ unknown term valid terms: cmask,pc,event,edge,in_tx,any,ldlat,inv,umask,in_tx_cp,offcore_rsp,config,config1,config2,name,period Run 'perf list' for a list of valid events Usage: perf stat [<options>] [<command>] -e, --event <event> event selector. use 'perf list' to list available events # # perf stat -e 'cpu/uops_executed.core,period=1/' true Performance counter stats for 'true': 808,332 cpu/uops_executed.core,period=1/ 0.002997237 seconds time elapsed # Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Namhyung Kim <namhyung@kernel.org> Fixes: 9445464bb831 ("perf tools: Unwind properly location after REJECT") Link: http://lkml.kernel.org/n/tip-diea0ihbwpxfw6938huv3whj@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-11-09perf tools: Add "reject" option for parse-events.lJiri Olsa1-0/+1
Arnaldo reported broken builds in some distros using a newer flex release, 2.6.4, found in Alpine Linux 3.6 and Edge, with flex not spotting the REJECT macro: CC /tmp/build/perf/util/parse-events-flex.o util/parse-events.l: In function 'parse_events_lex': /tmp/build/perf/util/parse-events-flex.c:4734:16: error: \ 'reject_used_but_not_detected' undeclared (first use in this function) It's happening because we put the REJECT under another USER_REJECT macro in following commit: 9445464bb831 perf tools: Unwind properly location after REJECT Fortunately flex provides option for force it to use REJECT, adding it to parse-events.l. Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Reviewed-by: Andi Kleen <andi@firstfloor.org> Tested-by: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Fixes: 9445464bb831 ("perf tools: Unwind properly location after REJECT") Link: http://lkml.kernel.org/n/tip-7kdont984mw12ijk7rji6b8p@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-11-07Merge branch 'linus' into locking/core, to resolve conflictsIngo Molnar180-3/+195
Conflicts: include/linux/compiler-clang.h include/linux/compiler-gcc.h include/linux/compiler-intel.h include/uapi/linux/stddef.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-07Merge branch 'linus' into perf/core, to fix conflictsIngo Molnar180-3/+195
Conflicts: tools/perf/arch/arm/annotate/instructions.c tools/perf/arch/arm64/annotate/instructions.c tools/perf/arch/powerpc/annotate/instructions.c tools/perf/arch/s390/annotate/instructions.c tools/perf/arch/x86/tests/intel-cqm.c tools/perf/ui/tui/progress.c tools/perf/util/zlib.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-03Merge branch 'linus' into perf/urgent, to pick up dependent commitsIngo Molnar178-0/+178
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman178-0/+178
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01perf srcline: Show correct function name for srcline of callchainsNamhyung Kim1-40/+55
When libbfd is not used, it doesn't show proper function name and reuse the original symbol of the sample. That's because it passes the original sym to inline_list__append(). As `addr2line -f` returns function names as well, use that to create an inline_sym and pass it to inline_list__append(). For example, following data shows that inlined entries of main have same name (main). Before: $ perf report -g srcline -q | head 45.22% inlining libm-2.26.so [.] __hypot_finite | ---__hypot_finite ??:0 | |--44.15%--hypot ??:0 | main complex:589 | main complex:597 | main complex:654 | main complex:664 | main inlining.cpp:14 After: $ perf report -g srcline -q | head 45.22% inlining libm-2.26.so [.] __hypot_finite | ---__hypot_finite | |--44.15%--hypot | std::__complex_abs complex:589 (inlined) | std::abs<double> complex:597 (inlined) | std::_Norm_helper<true>::_S_do_it<double> complex:654 (inlined) | std::norm<double> complex:664 (inlined) | main inlining.cpp:14 Signed-off-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Reviewed-by: Milian Wolff <milian.wolff@kdab.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kernel-team@lge.com Link: http://lkml.kernel.org/r/20171031020654.31163-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-11-01perf srcline: Fix memory leak in addr2inlines()Namhyung Kim1-5/+2
When libbfd is not used, addr2inlines() executes `addr2line -i` and process output line by line. But it resets filename to NULL in the loop so getline() allocates additional memory everytime instead of realloc. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kernel-team@lge.com Link: http://lkml.kernel.org/r/20171031020654.31163-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-31perf callchain: Fix double mapping al->addr for children without self periodNamhyung Kim1-4/+1
Milian Wolff found a problem he described in [1] and that for him would get fixed: "Note how most of the large offset values are now gone. Most notably, we get proper srcline resolution for the random.h and complex headers." Then Namhyung found the root cause: "I looked into it and found a bug handling cumulative (children) entries. For children entries that have no self period, the al->addr (so he->ip) ends up having an doubly-mapped address. It seems to be there from the beginning but only affects entries that have no srclines - finding srcline itself is done using a different address but it will show the invalid address if no srcline was found. I think we should fix the commit c7405d85d7a3 ("perf tools: Update cpumode for each cumulative entry")." [1] https://lkml.kernel.org/r/20171018185350.14893-7-milian.wolff@kdab.com Reported-by: Milian Wolff <milian.wolff@kdab.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Milian Wolff <milian.wolff@kdab.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: kernel-team@lge.com Fixes: c7405d85d7a3 ("perf tools: Update cpumode for each cumulative entry") Link: https://lkml.kernel.org/r/20171020051533.GA2746@sejong Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-30perf stat: Make --per-thread update shadow stats to show metricsJiri Olsa1-0/+2
We should support this because it would allow easily to collect metrics for different threads in applications. Original patch from posted by Jin Yao in here [1]. 1. Current output, for example: root@skl:/tmp# perf stat --per-thread -p 21623 ^C Performance counter stats for process id '21623': vmstat-21623 0.517479 task-clock (msec) # 0.000 CPUs utilized vmstat-21623 1 context-switches vmstat-21623 0 cpu-migrations vmstat-21623 0 page-faults vmstat-21623 461,306 cycles vmstat-21623 630,724 instructions vmstat-21623 136,265 branches vmstat-21623 2,520 branch-misses 1.444020756 seconds time elapsed root@skl:/tmp# perf stat --per-thread --metrics ipc -p 21623 ^C Performance counter stats for process id '21623': vmstat-21623 631,185 inst_retired.any vmstat-21623 605,893 cpu_clk_unhalted.thread 1.415679293 seconds time elapsed 2. With this patch, the result would be: root@skl:/tmp# perf stat --per-thread -p 21623 ^C Performance counter stats for process id '21623': vmstat-21623 0.533759 task-clock (msec) # 0.000 CPUs utilized vmstat-21623 1 context-switches # 0.002 M/sec vmstat-21623 0 cpu-migrations # 0.000 K/sec vmstat-21623 0 page-faults # 0.000 K/sec vmstat-21623 473,896 cycles # 0.888 GHz vmstat-21623 631,072 instructions # 1.33 insn per cycle vmstat-21623 136,307 branches # 255.372 M/sec vmstat-21623 2,524 branch-misses # 1.85% of all branches 1.544862861 seconds time elapsed root@skl:/tmp# perf stat --per-thread --metrics ipc -p 21623 ^C Performance counter stats for process id '21623': vmstat-21623 1,259,104 inst_retired.any # 1.2 IPC vmstat-21623 1,056,756 cpu_clk_unhalted.thread 2.040954502 seconds time elapsed [1] https://marc.info/?l=linux-kernel&m=150777054620511&w=2 Originally-from: Jin Yao <yao.jin@linux.intel.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Changbin Du <changbin.du@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-tr8ntktxmy4qc5769ajg5u6c@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-30perf stat: Move the shadow stats scale computation in ↵Jiri Olsa3-28/+28
perf_stat__update_shadow_stats Move the shadow stats scale computation to the perf_stat__update_shadow_stats() function, so it's centralized and we don't forget to do it. It also saves few lines of code. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Changbin Du <changbin.du@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-htg7mmyxv6pcrf57qyo6msid@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-30perf tools: Add perf_data_file__write functionJiri Olsa2-1/+9
Adding perf_data_file__write function to provide single file write operation. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Changbin Du <changbin.du@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-c3f9p4xzykr845ktqcek6p4t@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-30perf tools: Add struct perf_data_fileJiri Olsa4-32/+36
Add struct perf_data_file to represent a single file within a perf_data struct. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Changbin Du <changbin.du@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-c3f9p4xzykr845ktqcek6p4t@git.kernel.org [ Fixup recent changes in 'perf script --per-event-dump' ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-30perf tools: Rename struct perf_data_file to perf_dataJiri Olsa11-113/+113
Rename struct perf_data_file to perf_data, because we will add the possibility to have multiple files under perf.data, so the 'perf_data' name fits better. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Changbin Du <changbin.du@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-39wn4d77phel3dgkzo3lyan0@git.kernel.org [ Fixup recent changes in 'perf script --per-event-dump' ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-27perf tools: Unwind properly location after REJECTJiri Olsa1-2/+6
We have defined YY_USER_ACTION to keep trace of the column location during events parsing, but we need to clean it up when we call REJECT. When REJECT is called, the lexer shrinks the text and re-runs the matching, so we need to address it in resuming the previous location value to keep it correct for error display, like: Before: $ perf stat -e 'cpu/uops_executed.core,krava/' true event syntax error: '..38;5;9:mi=01;05;37;41:su=48;5;196;38;5;15:sg=48;5;1\ 1;38;5;16:ca=48;5;196;38;5;226:tw=48;5;10;38;5;16:ow=48;5;10;38;5;21:st=48;5;\ 21;38;50 �' \___ unknown term After: $ ./perf stat -e 'cpu/uops_executed.core,krava/' true event syntax error: '..cuted.core,krava/' \___ unknown term Signed-off-by: Jiri Olsa <jolsa@kernel.org> Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Andi Kleen <ak@linux.intel.com> Cc: Changbin Du <changbin.du@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-vug2hchlny30jfsfrumbym26@git.kernel.org Link: http://lkml.kernel.org/r/20171009140944.GD28623@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-27perf evsel: Restore evsel->priv as a tool private areaArnaldo Carvalho de Melo2-8/+11
When we started using it for stats and did it not just in builtin-stat.c, but also for builtin-script.c, then it stopped being a tool private area, so introduce a new pointer for these stats and leave ->priv to its original purpose. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: yuzhoujian <yuzhoujian@didichuxing.com> Fixes: cfc8874a4859 ("perf script: Process cpu/threads maps") Link: http://lkml.kernel.org/n/tip-jtpzx3rjqo78snmmsdzwb2eb@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-25perf symbols: Fix memory corruption because of zero length symbolsRavi Bangoria1-1/+11
Perf top is often crashing at very random locations on powerpc. After investigating, I found the crash only happens when sample is of zero length symbol. Powerpc kernel has many such symbols which does not contain length details in vmlinux binary and thus start and end addresses of such symbols are same. Structure struct sym_hist { u64 nr_samples; u64 period; struct sym_hist_entry addr[0]; }; has last member 'addr[]' of size zero. 'addr[]' is an array of addresses that belongs to one symbol (function). If function consist of 100 instructions, 'addr' points to an array of 100 'struct sym_hist_entry' elements. For zero length symbol, it points to the *empty* array, i.e. no members in the array and thus offset 0 is also invalid for such array. static int __symbol__inc_addr_samples(...) { ... offset = addr - sym->start; h = annotation__histogram(notes, evidx); h->nr_samples++; h->addr[offset].nr_samples++; h->period += sample->period; h->addr[offset].period += sample->period; ... } Here, when 'addr' is same as 'sym->start', 'offset' becomes 0, which is valid for normal symbols but *invalid* for zero length symbols and thus updating h->addr[offset] causes memory corruption. Fix this by adding one dummy element for zero length symbols. Link: https://lkml.org/lkml/2016/10/10/148 Fixes: edee44be5919 ("perf annotate: Don't throw error for zero length symbols") Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Taeung Song <treeze.taeung@gmail.com> Link: http://lkml.kernel.org/r/1508854806-10542-1-git-send-email-ravi.bangoria@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-25perf util: Enable handling of inlined frames by defaultMilian Wolff1-0/+1
Now that we have caches in place to speed up the process of finding inlined frames and srcline information repeatedly, we can enable this useful option by default. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20171019113836.5548-6-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-25perf report: Use srcline from callchain for hist entriesMilian Wolff4-0/+5
This also removes the symbol name from the srcline column, more on this below. This ensures we use the correct srcline, which could originate from a potentially inlined function. The hist entries used to query for the srcline based purely on the IP, which leads to wrong results for inlined entries. Before: ~~~~~ perf report --inline -s srcline -g none --stdio ... # Children Self Source:Line # ........ ........ .................................................................................................................................. # 94.23% 0.00% __libc_start_main+18446603487898210537 94.23% 0.00% _start+41 44.58% 0.00% main+100 44.58% 0.00% std::_Norm_helper<true>::_S_do_it<double>+100 44.58% 0.00% std::__complex_abs+100 44.58% 0.00% std::abs<double>+100 44.58% 0.00% std::norm<double>+100 36.01% 0.00% hypot+18446603487892193300 25.81% 0.00% main+41 25.81% 0.00% std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+41 25.81% 0.00% std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+41 25.75% 25.75% random.h:143 18.39% 0.00% main+57 18.39% 0.00% std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+57 18.39% 0.00% std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+57 13.80% 13.80% random.tcc:3330 5.64% 0.00% ??:0 4.13% 4.13% __hypot_finite+163 4.13% 0.00% __hypot_finite+18446603487892193443 ... ~~~~~ After: ~~~~~ perf report --inline -s srcline -g none --stdio ... # Children Self Source:Line # ........ ........ ........................................... # 94.30% 1.19% main.cpp:39 94.23% 0.00% __libc_start_main+18446603487898210537 94.23% 0.00% _start+41 48.44% 1.70% random.h:1823 48.44% 0.00% random.h:1814 46.74% 2.53% random.h:185 44.68% 0.10% complex:589 44.68% 0.00% complex:597 44.68% 0.00% complex:654 44.68% 0.00% complex:664 40.61% 13.80% random.tcc:3330 36.01% 0.00% hypot+18446603487892193300 26.81% 0.00% random.h:151 26.81% 0.00% random.h:332 25.75% 25.75% random.h:143 5.64% 0.00% ??:0 4.13% 4.13% __hypot_finite+163 4.13% 0.00% __hypot_finite+18446603487892193443 ... ~~~~~ Note that this change removes the symbol from the source:line hist column. If this information is desired, users should explicitly query for it if needed. I.e. run this command instead: ~~~~~ perf report --inline -s sym,srcline -g none --stdio ... # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 1K of event 'cycles:uppp' # Event count (approx.): 1381229476 # # Children Self Symbol Source:Line # ........ ........ ................................................................................................................................... ........................................... # 94.30% 1.19% [.] main main.cpp:39 94.23% 0.00% [.] __libc_start_main __libc_start_main+18446603487898210537 94.23% 0.00% [.] _start _start+41 48.44% 0.00% [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) random.h:1814 48.44% 0.00% [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) random.h:1823 46.74% 0.00% [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined) random.h:185 44.68% 0.00% [.] std::_Norm_helper<true>::_S_do_it<double> (inlined) complex:654 44.68% 0.00% [.] std::__complex_abs (inlined) complex:589 44.68% 0.00% [.] std::abs<double> (inlined) complex:597 44.68% 0.00% [.] std::norm<double> (inlined) complex:664 39.80% 13.59% [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3330 36.01% 0.00% [.] hypot hypot+18446603487892193300 26.81% 0.00% [.] std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined) random.h:151 26.81% 0.00% [.] std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined) random.h:332 25.75% 0.00% [.] std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined) random.h:143 25.19% 25.19% [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:143 4.13% 4.13% [.] __hypot_finite __hypot_finite+163 4.13% 0.00% [.] __hypot_finite __hypot_finite+18446603487892193443 ... ~~~~~ Compared to the old behavior, this reduces duplication in the output. Before we used to print the symbol name in the srcline column even when the sym column was explicitly requested. I.e. the output was: ~~~~~ perf report --inline -s sym,srcline -g none --stdio ... # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 1K of event 'cycles:uppp' # Event count (approx.): 1381229476 # # Children Self Symbol Source:Line # ........ ........ ................................................................................................................................... .................................................................................................................................. # 94.23% 0.00% [.] __libc_start_main __libc_start_main+18446603487898210537 94.23% 0.00% [.] _start _start+41 44.58% 0.00% [.] main main+100 44.58% 0.00% [.] std::_Norm_helper<true>::_S_do_it<double> (inlined) std::_Norm_helper<true>::_S_do_it<double>+100 44.58% 0.00% [.] std::__complex_abs (inlined) std::__complex_abs+100 44.58% 0.00% [.] std::abs<double> (inlined) std::abs<double>+100 44.58% 0.00% [.] std::norm<double> (inlined) std::norm<double>+100 36.01% 0.00% [.] hypot hypot+18446603487892193300 25.81% 0.00% [.] main main+41 25.81% 0.00% [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined) std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+41 25.81% 0.00% [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+41 25.69% 25.69% [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:143 18.39% 0.00% [.] main main+57 18.39% 0.00% [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined) std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+57 18.39% 0.00% [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+57 13.80% 13.80% [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3330 4.13% 4.13% [.] __hypot_finite __hypot_finite+163 4.13% 0.00% [.] __hypot_finite __hypot_finite+18446603487892193443 ... ~~~~~ Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20171019113836.5548-5-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-25perf report: Cache srclines for callchain nodesMilian Wolff5-3/+90
On one hand this ensures that the memory is properly freed when the DSO gets freed. On the other hand this significantly speeds up the processing of the callchain nodes when lots of srclines are requested. For one of my data files e.g.: Before: Performance counter stats for 'perf report -s srcline -g srcline --stdio': 52496.495043 task-clock (msec) # 0.999 CPUs utilized 634 context-switches # 0.012 K/sec 2 cpu-migrations # 0.000 K/sec 191,561 page-faults # 0.004 M/sec 165,074,498,235 cycles # 3.144 GHz 334,170,832,408 instructions # 2.02 insn per cycle 90,220,029,745 branches # 1718.591 M/sec 654,525,177 branch-misses # 0.73% of all branches 52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks! After: Performance counter stats for 'perf report -s srcline -g srcline --stdio': 22606.323706 task-clock (msec) # 1.000 CPUs utilized 31 context-switches # 0.001 K/sec 0 cpu-migrations # 0.000 K/sec 185,471 page-faults # 0.008 M/sec 71,188,113,681 cycles # 3.149 GHz 133,204,943,083 instructions # 1.87 insn per cycle 34,886,384,979 branches # 1543.214 M/sec 278,214,495 branch-misses # 0.80% of all branches 22.609857253 seconds time elapsed Note that the difference is only this large when `--inline` is not passed. In such situations, we would use the inliner cache and thus do not run this code path that often. I think that this cache should actually be used in other places, too. When looking at the valgrind leak report for perf report, we see tons of srclines being leaked, most notably from calls to hist_entry__get_srcline. The problem is that get_srcline has many different formatting options (show_sym, show_addr, potentially even unwind_inlines when calling __get_srcline directly). As such, the srcline cannot easily be cached for all calls, or we'd have to add caches for all formatting combinations (6 so far). An alternative would be to remove the formatting options and handle that on a different level - i.e. print the sym/addr on demand wherever we actually output something. And the unwind_inlines could be moved into a separate function that does not return the srcline. Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20171019113836.5548-4-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-25perf report: Cache failed lookups of inlined framesMilian Wolff2-23/+8
When no inlined frames could be found for a given address, we did not store this information anywhere. That means we potentially do the costly inliner lookup repeatedly for cases where we know it can never succeed. This patch makes dso__parse_addr_inlines always return a valid inline_node. It will be empty when no inliners are found. This enables us to cache the empty list in the DSO, thereby improving the performance when many addresses fail to find the inliners. For my trivial example, the performance impact is already quite significant: Before: ~~~~~ Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs): 594.804032 task-clock (msec) # 0.998 CPUs utilized ( +- 0.07% ) 53 context-switches # 0.089 K/sec ( +- 4.09% ) 0 cpu-migrations # 0.000 K/sec ( +-100.00% ) 5,687 page-faults # 0.010 M/sec ( +- 0.02% ) 2,300,918,213 cycles # 3.868 GHz ( +- 0.09% ) 4,395,839,080 instructions # 1.91 insn per cycle ( +- 0.00% ) 939,177,205 branches # 1578.969 M/sec ( +- 0.00% ) 11,824,633 branch-misses # 1.26% of all branches ( +- 0.10% ) 0.596246531 seconds time elapsed ( +- 0.07% ) ~~~~~ After: ~~~~~ Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs): 113.111405 task-clock (msec) # 0.990 CPUs utilized ( +- 0.89% ) 29 context-switches # 0.255 K/sec ( +- 54.25% ) 0 cpu-migrations # 0.000 K/sec 5,380 page-faults # 0.048 M/sec ( +- 0.01% ) 432,378,779 cycles # 3.823 GHz ( +- 0.75% ) 670,057,633 instructions # 1.55 insn per cycle ( +- 0.01% ) 141,001,247 branches # 1246.570 M/sec ( +- 0.01% ) 2,346,845 branch-misses # 1.66% of all branches ( +- 0.19% ) 0.114222393 seconds time elapsed ( +- 1.19% ) ~~~~~ Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20171019113836.5548-3-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-25perf report: Properly handle branch count in match_chain()Milian Wolff1-62/+78
Some of the code paths I introduced before returned too early without running the code to handle a node's branch count. By refactoring match_chain to only have one exit point, this can be remedied. Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1707691.qaJ269GSZW@agathebauer Link: http://lkml.kernel.org/r/20171018185350.14893-2-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-25locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns ↵Mark Rutland2-3/+3
to READ_ONCE()/WRITE_ONCE() Please do not apply this to mainline directly, instead please re-run the coccinelle script shown below and apply its output. For several reasons, it is desirable to use {READ,WRITE}_ONCE() in preference to ACCESS_ONCE(), and new code is expected to use one of the former. So far, there's been no reason to change most existing uses of ACCESS_ONCE(), as these aren't harmful, and changing them results in churn. However, for some features, the read/write distinction is critical to correct operation. To distinguish these cases, separate read/write accessors must be used. This patch migrates (most) remaining ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following coccinelle script: ---- // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and // WRITE_ONCE() // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch virtual patch @ depends on patch @ expression E1, E2; @@ - ACCESS_ONCE(E1) = E2 + WRITE_ONCE(E1, E2) @ depends on patch @ expression E; @@ - ACCESS_ONCE(E) + READ_ONCE(E) ---- Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: davem@davemloft.net Cc: linux-arch@vger.kernel.org Cc: mpe@ellerman.id.au Cc: shuah@kernel.org Cc: snitzer@redhat.com Cc: thor.thayer@linux.intel.com Cc: tj@kernel.org Cc: viro@zeniv.linux.org.uk Cc: will.deacon@arm.com Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-24perf report: Compare symbol name for inlined frames when sortingMilian Wolff1-0/+3
Similar to the callstack frame matching, we also have to compare the symbol name when sorting hist entries. The reason is twofold: On one hand, multiple inlined functions will use the same symbol start/end values of the parent, non-inlined symbol. As such, all of these symbols often end up missing from top-level report, as they get merged with the non-inlined frame. On the other hand, multiple different functions may end up inlining the same function, and we need to aggregate these values properly. Before: ~~~~~ perf report --stdio --inline -g none # Children Self Command Shared Object Symbol # ........ ........ ............ ............. ................................... # 100.00% 39.69% cpp-inlining cpp-inlining [.] main 100.00% 0.00% cpp-inlining cpp-inlining [.] _start 100.00% 0.00% cpp-inlining libc-2.25.so [.] __libc_start_main 97.03% 0.00% cpp-inlining cpp-inlining [.] std::norm<double> (inlined) 59.53% 4.26% cpp-inlining libm-2.25.so [.] hypot 55.21% 55.08% cpp-inlining libm-2.25.so [.] __hypot_finite 0.52% 0.52% cpp-inlining libm-2.25.so [.] cabs ~~~~~ After: ~~~~~ perf report --stdio --inline -g none # Children Self Command Shared Object Symbol # ........ ........ ............ ............. ................................................................................................................................... # 100.00% 39.69% cpp-inlining cpp-inlining [.] main 100.00% 0.00% cpp-inlining cpp-inlining [.] _start 100.00% 0.00% cpp-inlining libc-2.25.so [.] __libc_start_main 62.57% 0.00% cpp-inlining cpp-inlining [.] std::_Norm_helper<true>::_S_do_it<double> (inlined) 62.57% 0.00% cpp-inlining cpp-inlining [.] std::__complex_abs (inlined) 62.57% 0.00% cpp-inlining cpp-inlining [.] std::abs<double> (inlined) 62.57% 0.00% cpp-inlining cpp-inlining [.] std::norm<double> (inlined) 59.53% 4.26% cpp-inlining libm-2.25.so [.] hypot 55.21% 55.08% cpp-inlining libm-2.25.so [.] __hypot_finite 34.46% 0.00% cpp-inlining cpp-inlining [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) 32.39% 0.00% cpp-inlining cpp-inlining [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined) 32.39% 0.00% cpp-inlining cpp-inlining [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) 12.29% 0.00% cpp-inlining cpp-inlining [.] std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined) 12.29% 0.00% cpp-inlining cpp-inlining [.] std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined) 12.29% 0.00% cpp-inlining cpp-inlining [.] std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined) 0.52% 0.52% cpp-inlining libm-2.25.so [.] cabs ~~~~~ Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-11-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24perf callchain: Compare symbol name for inlined frames when matchingMilian Wolff1-0/+8
The fake symbols we create for inlined frames will represent different functions but can use the symbol start address. This leads to issues when different inline branches all lead to the same function. Before: ~~~~~ $ perf report -s sym -i perf.inlining.data --inline --stdio -g function ... --38.86%--_start __libc_start_main main | --37.57%--std::norm<double> (inlined) std::_Norm_helper<true>::_S_do_it<double> (inlined) | --36.36%--std::abs<double> (inlined) std::__complex_abs (inlined) | --12.24%--std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined) std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined) std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined) ~~~~~ Note that this backtrace representation is completely bogus. Complex abs does not call the linear congruential engine! It is just a side-effect of a longer inlined stack being appended to a shorter, different inlined stack, both of which originate in the same function (main). This patch fixes the issue: ~~~~~ $ perf report -s sym -i perf.inlining.data --inline --stdio -g function ... --38.86%--_start __libc_start_main main | |--35.59%--std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) | std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) | | | --34.37%--std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined) | std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) | | | --12.24%--std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined) | std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined) | std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined) | --1.99%--std::norm<double> (inlined) std::_Norm_helper<true>::_S_do_it<double> (inlined) std::abs<double> (inlined) std::__complex_abs (inlined) ~~~~~ Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-10-milian.wolff@kdab.com Cc: Arnaldo Carvalho de Melo <acme@redhat.com> [ Fix up conflict with c1fbc0cf81f1 ("perf callchain: Compare dsos (as well) for CCKEY_FUNCTION"), remove unneeded hunk ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24perf script: Mark inlined frames and do not print DSO for themMilian Wolff1-1/+4
Instead of showing the (repeated) DSO name of the non-inlined frame, we now show the "(inlined)" suffix instead. Before: 214f7 __hypot_finite (/usr/lib/libm-2.25.so) ace3 hypot (/usr/lib/libm-2.25.so) a4a std::__complex_abs (/home/milian/projects/src/perf-tests/inlining) a4a std::abs<double> (/home/milian/projects/src/perf-tests/inlining) a4a std::_Norm_helper<true>::_S_do_it<double> (/home/milian/projects/src/perf-tests/inlining) a4a std::norm<double> (/home/milian/projects/src/perf-tests/inlining) a4a main (/home/milian/projects/src/perf-tests/inlining) 20510 __libc_start_main (/usr/lib/libc-2.25.so) bd9 _start (/home/milian/projects/src/perf-tests/inlining) After: 214f7 __hypot_finite (/usr/lib/libm-2.25.so) ace3 hypot (/usr/lib/libm-2.25.so) a4a std::__complex_abs (inlined) a4a std::abs<double> (inlined) a4a std::_Norm_helper<true>::_S_do_it<double> (inlined) a4a std::norm<double> (inlined) a4a main (/home/milian/projects/src/perf-tests/inlining) 20510 __libc_start_main (/usr/lib/libc-2.25.so) bd9 _start (/home/milian/projects/src/perf-tests/inlining) Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-9-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24perf callchain: Mark inlined frames in output by " (inlined)" suffixMilian Wolff2-3/+10
The original patch that introduced inline frame output in the various browsers used this suffix already. The new centralized approach that uses fake symbols for inlined frames was missing this approach so far. Instead of changing the symbol name itself, we only print the suffix where needed. This allows us to efficiently lookup the symbol for a given name without first having to append the suffix before the lookup. Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-8-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24perf report: Fall-back to function name comparison for -g srclineMilian Wolff1-8/+12
When a callchain entry has no srcline available, we ended up comparing the instruction pointer. I consider this to be not too useful. Rather, I think we should group the entries by function name, which this patch adds. For people who want to split the data on the IP boundary, using `-g address` is the correct choice. Before: ~~~~~ 100.00% 38.86% [.] main | |--61.14%--main inlining.cpp:14 | std::norm<double> complex:664 | std::_Norm_helper<true>::_S_do_it<double> complex:654 | std::abs<double> complex:597 | std::__complex_abs complex:589 | | | |--56.03%--hypot | | | | | |--8.45%--__hypot_finite | | | | | |--7.62%--__hypot_finite | | | | | |--2.29%--__hypot_finite | | | | | |--2.24%--__hypot_finite | | | | | |--2.06%--__hypot_finite | | | | | |--1.81%--__hypot_finite ... ~~~~~ After: ~~~~~ 100.00% 38.86% [.] main | |--61.14%--main inlining.cpp:14 | std::norm<double> complex:664 | std::_Norm_helper<true>::_S_do_it<double> complex:654 | std::abs<double> complex:597 | std::__complex_abs complex:589 | | | |--60.29%--hypot | | | | | --56.03%--__hypot_finite | | | --0.85%--cabs ~~~~~ Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-7-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24perf callchain: Create real callchain entries for inlined framesMilian Wolff5-0/+103
The inline_node structs are maintained by the new dso->inlines tree. This in turn keeps ownership of the fake symbols and srcline string representing an inline frame. This tree is sorted by address to allow quick lookups. All other entries of the symbol beside the function name are unused for inline frames. The advantage of this approach is that all existing users of the callchain API can now transparently display inlined frames without having to patch their code. Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-6-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24perf callchain: Refactor inline_list to store srcline string directlyMilian Wolff2-16/+41
This is a preparation for the creation of real callchain entries for inlined frames. The rest of the perf code uses the srcline string. As such, using that also for the srcline API allows us to simplify some of the upcoming code. Most notably, it will allow us to cache the srcline for a given inline node and reuse it for different callchain entries. Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-5-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24perf callchain: Refactor inline_list to operate on symbolsMilian Wolff3-32/+69
This is a requirement to create real callchain entries for inlined frames. Since the list of inlines usually contains the target symbol too, i.e. the location where the frames get inlined to, we alias that symbol and reuse it as-is is. This ensures that other dependent functionality keeps working, most notably annotation of the target frames. For all other entries in the inline_list, a fake symbol is created. These are marked by new 'inlined' member which is set to true. Only those symbols are managed by the inline_list and get freed when the inline_list is deleted from within inline_node__delete. Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-4-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24perf callchain: Store srcline in callchain_cursor_nodeMilian Wolff3-26/+29
This is mostly a preparation to enable the creation of full callchain nodes for inline frames. Such frames will reference the IP of the non-inlined frame, but hold the symbol and srcline for an inlined location. As such, we won't be able to query the srcline on-demand based on the IP alone. Instead, we will leverage the functionality provided by this patch here, and store the srcline for the inlined nodes in the new srcline member of callchain_cursor_node. Note that this patch on its own leaks the srcline, as there is no free_callchain_cursor_node or similar. A future patch will add caching of the srcline and handle deletion properly. Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-3-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24perf report: Remove code to handle inline frames from browsersMilian Wolff3-38/+0
The follow-up commits will make inline frames first-class citizens in the callchain, thereby obsoleting all of this special code. Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-2-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf namespaces: Add more appropriate set of headersArnaldo Carvalho de Melo1-2/+3
We don't need perf.h, that is a kitchen sink, all we need is perf_events.h for perf_ns_link_info, sys/types.h for pid_t and linux/types.h for u64, list_head. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Li Zhijian <lizhijian@cn.fujitsu.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-f2uxyaj4s2hmntkrezpa6dqz@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf tools: Introduce binary__fprintf()Arnaldo Carvalho de Melo3-33/+46
Out of print_binary() but receiving a fp pointer and expecting that the printer be a fprintf like function, i.e. receive a FILE pointer and return the number of characters printed. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: yuzhoujian <yuzhoujian@didichuxing.com> Link: http://lkml.kernel.org/n/tip-6oqnxr6lmgqe6q6p3iugnscx@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf annotate: Remove arch::cpuid_parse callbackJiri Olsa1-7/+3
There's no need for extra cpuid_parse arch callback, it can be handled directly in init callback. Adding the init function to x86 to cover the cpuid initialization. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20171011150158.11895-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf mmap: Adopt push method from builtin-record.cArnaldo Carvalho de Melo2-0/+103
The previous prep patch was just to show exactly what changed in that function, now its time to move that method and things only it uses to the right place, mmap.[ch] Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-aaxywfgw3d44x6xlu8zm1avu@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf mmap: Move perf_mmap and methods to separate mmap.[ch] filesArnaldo Carvalho de Melo6-323/+349
To better organize the sources, and we may end up even using it directly, without evlists and evsels. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-oiqrm7grflurnnzo2ovfnslg@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-20Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar7-24/+71
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-18perf xyarray: Fix wrong processing when closing evsel fdJin Yao1-2/+2
In current xyarray code, xyarray__max_x() returns max_y, and xyarray__max_y() returns max_x. It's confusing and for code logic it looks not correct. Error happens when closing evsel fd. Let's see this scenario: 1. Allocate an fd (pseudo-code) perf_evsel__alloc_fd(struct perf_evsel *evsel, int ncpus, int nthreads) { evsel->fd = xyarray__new(ncpus, nthreads, sizeof(int)); } xyarray__new(int xlen, int ylen, size_t entry_size) { size_t row_size = ylen * entry_size; struct xyarray *xy = zalloc(sizeof(*xy) + xlen * row_size); xy->entry_size = entry_size; xy->row_size = row_size; xy->entries = xlen * ylen; xy->max_x = xlen; xy->max_y = ylen; ...... } So max_x is ncpus, max_y is nthreads and row_size = nthreads * 4. 2. Use perf syscall and get the fd int perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus, struct thread_map *threads) { for (cpu = 0; cpu < cpus->nr; cpu++) { for (thread = 0; thread < nthreads; thread++) { int fd, group_fd; fd = sys_perf_event_open(&evsel->attr, pid, cpus->map[cpu], group_fd, flags); FD(evsel, cpu, thread) = fd; } } static inline void *xyarray__entry(struct xyarray *xy, int x, int y) { return &xy->contents[x * xy->row_size + y * xy->entry_size]; } These codes don't have issues. The issue happens in the closing of fd. 3. Close fd. void perf_evsel__close_fd(struct perf_evsel *evsel) { int cpu, thread; for (cpu = 0; cpu < xyarray__max_x(evsel->fd); cpu++) for (thread = 0; thread < xyarray__max_y(evsel->fd); ++thread) { close(FD(evsel, cpu, thread)); FD(evsel, cpu, thread) = -1; } } Since xyarray__max_x() returns max_y (nthreads) and xyarry__max_y() returns max_x (ncpus), so above code is actually to be: for (cpu = 0; cpu < nthreads; cpu++) for (thread = 0; thread < ncpus; ++thread) { close(FD(evsel, cpu, thread)); FD(evsel, cpu, thread) = -1; } It's not correct! This change is introduced by "475fb533fb7d" ("perf evsel: Fix buffer overflow while freeing events") This fix is to let xyarray__max_x() return max_x (ncpus) and let xyarry__max_y() return max_y (nthreads) Committer note: This was also fixed by Ravi Bangoria, who provided the same patch, noticing the problem with 'perf record': <quote Ravi> I see 'perf record -p <pid>' crashes with following log: *** Error in `./perf': free(): invalid next size (normal): 0x000000000298b340 *** ======= Backtrace: ========= /lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f7fd85c87e5] /lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7f7fd85d137a] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f7fd85d553c] ./perf(perf_evsel__close+0xb4)[0x4b7614] ./perf(perf_evlist__delete+0x100)[0x4ab180] ./perf(cmd_record+0x1d9)[0x43a5a9] ./perf[0x49aa2f] ./perf(main+0x631)[0x427841] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f7fd8571830] ./perf(_start+0x29)[0x427a59] </> Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Fixes: d74be4767367 ("perf xyarray: Save max_x, max_y") Link: http://lkml.kernel.org/r/1508339478-26674-1-git-send-email-yao.jin@linux.intel.com Link: http://lkml.kernel.org/r/1508327446-15302-1-git-send-email-ravi.bangoria@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-17perf buildid-list: Fix crash when processing PERF_RECORD_NAMESPACENamhyung Kim1-0/+2
Thomas reported that 'perf buildid-list' gets a SEGFAULT due to NULL pointer deref when he ran it on a data with namespace events. It was because the buildid_id__mark_dso_hit_ops lacks the namespace event handler and perf_too__fill_default() didn't set it. Program received signal SIGSEGV, Segmentation fault. 0x0000000000000000 in ?? () Missing separate debuginfos, use: dnf debuginfo-install audit-libs-2.7.7-1.fc25.s390x bzip2-libs-1.0.6-21.fc25.s390x elfutils-libelf-0.169-1.fc25.s390x +elfutils-libs-0.169-1.fc25.s390x libcap-ng-0.7.8-1.fc25.s390x numactl-libs-2.0.11-2.ibm.fc25.s390x openssl-libs-1.1.0e-1.1.ibm.fc25.s390x perl-libs-5.24.1-386.fc25.s390x +python-libs-2.7.13-2.fc25.s390x slang-2.3.0-7.fc25.s390x xz-libs-5.2.3-2.fc25.s390x zlib-1.2.8-10.fc25.s390x (gdb) where #0 0x0000000000000000 in ?? () #1 0x00000000010fad6a in machines__deliver_event (machines=<optimized out>, machines@entry=0x2c6fd18, evlist=<optimized out>, event=event@entry=0x3fffdf00470, sample=0x3ffffffe880, sample@entry=0x3ffffffe888, tool=tool@entry=0x1312968 <build_id.mark_dso_hit_ops>, file_offset=1136) at util/session.c:1287 #2 0x00000000010fbf4e in perf_session__deliver_event (file_offset=1136, tool=0x1312968 <build_id.mark_dso_hit_ops>, sample=0x3ffffffe888, event=0x3fffdf00470, session=0x2c6fc30) at util/session.c:1340 #3 perf_session__process_event (session=0x2c6fc30, session@entry=0x0, event=event@entry=0x3fffdf00470, file_offset=file_offset@entry=1136) at util/session.c:1522 #4 0x00000000010fddde in __perf_session__process_events (file_size=11880, data_size=<optimized out>, data_offset=<optimized out>, session=0x0) at util/session.c:1899 #5 perf_session__process_events (session=0x0, session@entry=0x2c6fc30) at util/session.c:1953 #6 0x000000000103b2ac in perf_session__list_build_ids (with_hits=<optimized out>, force=<optimized out>) at builtin-buildid-list.c:83 #7 cmd_buildid_list (argc=<optimized out>, argv=<optimized out>) at builtin-buildid-list.c:115 #8 0x00000000010a026c in run_builtin (p=0x1311f78 <commands+24>, argc=argc@entry=2, argv=argv@entry=0x3fffffff3c0) at perf.c:296 #9 0x000000000102bc00 in handle_internal_command (argv=<optimized out>, argc=2) at perf.c:348 #10 run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:392 #11 main (argc=<optimized out>, argv=0x3fffffff3c0) at perf.c:536 (gdb) Fix it by adding a stub event handler for namespace event. Committer testing: Further clarifying, plain using 'perf buildid-list' will not end up in a SEGFAULT when processing a perf.data file with namespace info: # perf record -a --namespaces sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.024 MB perf.data (1058 samples) ] # perf buildid-list | wc -l 38 # perf buildid-list | head -5 e2a171c7b905826fc8494f0711ba76ab6abbd604 /lib/modules/4.14.0-rc3+/build/vmlinux 874840a02d8f8a31cedd605d0b8653145472ced3 /lib/modules/4.14.0-rc3+/kernel/arch/x86/kvm/kvm-intel.ko ea7223776730cd8a22f320040aae4d54312984bc /lib/modules/4.14.0-rc3+/kernel/drivers/gpu/drm/i915/i915.ko 5961535e6732a8edb7f22b3f148bb2fa2e0be4b9 /lib/modules/4.14.0-rc3+/kernel/drivers/gpu/drm/drm.ko f045f54aa78cf1931cc893f78b6cbc52c72a8cb1 /usr/lib64/libc-2.25.so # It is only when one asks for checking what of those entries actually had samples, i.e. when we use either -H or --with-hits, that we will process all the PERF_RECORD_ events, and since tools/perf/builtin-buildid-list.c neither explicitely set a perf_tool.namespaces() callback nor the default stub was set that we end up, when processing a PERF_RECORD_NAMESPACE record, causing a SEGFAULT: # perf buildid-list -H Segmentation fault (core dumped) ^C # Reported-and-Tested-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Hari Bathini <hbathini@linux.vnet.ibm.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com> Fixes: f3b3614a284d ("perf tools: Add PERF_RECORD_NAMESPACES to include namespaces related info") Link: http://lkml.kernel.org/r/20171017132900.11043-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-13perf tools: Check wether the eBPF file exists in event parsingJiri Olsa1-2/+15
Adding the check wether the eBPF file exists, to consider it as eBPF input file. This way we can differentiate eBPF events from events that end up with same suffix as eBPF file. Before: $ perf stat -e 'cpu/uops_executed.core/' true bpf: builtin compilation failed: -95, try external compiler WARNING: unable to get correct kernel building directory. Hint: Set correct kbuild directory using 'kbuild-dir' option in [llvm] section of ~/.perfconfig or set it to "" to suppress kbuild detection. event syntax error: 'cpu/uops_executed.core/' \___ Failed to load cpu/uops_executed.c from source: 'version' section incorrect or lost After: $ perf stat -e 'cpu/uops_executed.core/' true Performance counter stats for 'true': 181,533 cpu/uops_executed.core/:u 0.002795447 seconds time elapsed If user makes type in the eBPF file, we prioritize the event syntax and show following warning: $ perf stat -e 'krava.c//' true event syntax error: 'krava.c//' \___ Cannot find PMU `krava.c'. Missing kernel support? Reported-and-Tested-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Changbin Du <changbin.du@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20171013083736.15037-9-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-09perf pmu: Unbreak perf record for arm/arm64 with events with explicit PMUMark Rutland3-19/+47
Currently, perf record is broken on arm/arm64 systems when the PMU is specified explicitly as part of the event, e.g. $ ./perf record -e armv8_cortex_a53/cpu_cycles/u true In such cases, perf record fails to open events unless perf_event_paranoid is set to -1, even if the PMU in question supports mode exclusion. Further, even when perf_event_paranoid is toggled, no samples are recorded. This is an unintended side effect of commit: e3ba76deef23064f ("perf tools: Force uncore events to system wide monitoring) ... which assumes that if a PMU has an associated cpu_map, it is an uncore PMU, and forces events for such PMUs to be system-wide. This is not true for arm/arm64 systems, which can have heterogeneous CPUs. To account for this, multiple CPU PMUs are exposed, each with a "cpus" field under sysfs, which the perf tool parses into a cpu_map. ARM PMUs do not have a "cpumask" file, and only have a "cpus" file. For the gory details as to why, see commit: 7e3fcffe95544010 ("perf pmu: Support alternative sysfs cpumask") Given all of this, we can instead identify uncore PMUs by explicitly checking for a "cpumask" file, and restore arm/arm64 PMU support back to a working state. This patch does so, adding a new perf_pmu::is_uncore field, and splitting the existing cpumask parsing so that it can be reused. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Tested-by Will Deacon <will.deacon@arm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: 4.12+ <stable@vger.kernel.org> Fixes: e3ba76deef23064f ("perf tools: Force uncore events to system wide monitoring) Link: http://lkml.kernel.org/r/1507315102-5942-1-git-send-email-mark.rutland@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-05perf callchain: Compare dsos (as well) for CCKEY_FUNCTIONRavi Bangoria1-1/+5
Two functions from different binaries can have same start address. Thus, comparing only start address in match_chain() leads to inconsistent callchains. Fix this by adding a check for dsos as well. Ex, https://www.spinics.net/lists/linux-perf-users/msg04067.html Reported-by: Alexander Pozdneev <pozdneyev@gmail.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Krister Johansen <kjlx@templeofstupid.com> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Cc: zhangmengting@huawei.com Link: http://lkml.kernel.org/r/20171005091234.5874-1-ravi.bangoria@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-03perf top: Add option to set the number of thread for event synthesizeKan Liang2-1/+5
Using UINT_MAX to indicate the default thread#, which is the max number of online CPU. Committer testing: # perf trace --no-inherit -e clone -o /tmp/output perf top --num-thread-synthesize 9 # cat /tmp/output ? ( ? ): ... [continued]: clone()) = 26651 (perf) 0.059 ( 0.010 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bfac44f30, parent_tidptr: 0x7f5bfac459d0, child_tidptr: 0x7f5bfac459d0, tls: 0x7f5bfac45700) = 26652 (perf) 0.116 ( 0.014 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bfa443f30, parent_tidptr: 0x7f5bfa4449d0, child_tidptr: 0x7f5bfa4449d0, tls: 0x7f5bfa444700) = 26653 (perf) 0.141 ( 0.009 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bf9c42f30, parent_tidptr: 0x7f5bf9c439d0, child_tidptr: 0x7f5bf9c439d0, tls: 0x7f5bf9c43700) = 26654 (perf) 0.160 ( 0.012 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bf9441f30, parent_tidptr: 0x7f5bf94429d0, child_tidptr: 0x7f5bf94429d0, tls: 0x7f5bf9442700) = 26655 (perf) 0.232 ( 0.013 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bf8c40f30, parent_tidptr: 0x7f5bf8c419d0, child_tidptr: 0x7f5bf8c419d0, tls: 0x7f5bf8c41700) = 26656 (perf) 0.393 ( 0.011 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5be3ffef30, parent_tidptr: 0x7f5be3fff9d0, child_tidptr: 0x7f5be3fff9d0, tls: 0x7f5be3fff700) = 26657 (perf) 0.802 ( 0.012 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5be37fdf30, parent_tidptr: 0x7f5be37fe9d0, child_tidptr: 0x7f5be37fe9d0, tls: 0x7f5be37fe700) = 26658 (perf) 1.411 ( 0.022 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5be2ffcf30, parent_tidptr: 0x7f5be2ffd9d0, child_tidptr: 0x7f5be2ffd9d0, tls: 0x7f5be2ffd700) = 26659 (perf) 246.422 ( 0.042 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5be2ffcf30, parent_tidptr: 0x7f5be2ffd9d0, child_tidptr: 0x7f5be2ffd9d0, tls: 0x7f5be2ffd700) = 26660 (perf) # Signed-off-by: Kan Liang <kan.liang@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: He Kuang <hekuang@huawei.com> Cc: Lukasz Odzioba <lukasz.odzioba@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1506696477-146932-5-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-03perf top: Implement multithreading for perf_event__synthesize_threadsKan Liang4-37/+143
The proc files which is sorted with alphabetical order are evenly assigned to several synthesize threads to be processed in parallel. For 'perf top', the threads number hard code to online CPU number. The following patch will introduce an option to set it. For other perf tools, the thread number is 1. Because the process function is not ready for multithreading, e.g. process_synthesized_event. This patch series only support event synthesize multithreading for 'perf top'. For other tools, it can be done separately later. With multithread applied, the total processing time can get up to 1.56x speedup on Knights Mill for 'perf top'. For specific single event processing, the processing time could increase because of the lock contention. So proc_map_timeout may need to be increased. Otherwise some proc maps will be truncated. Based on my test, increasing the proc_map_timeout has small impact on the total processing time. The total processing time still get 1.49x speedup on Knights Mill after increasing the proc_map_timeout. The patch itself doesn't increase the proc_map_timeout. Doesn't need to implement multithreading for per task monitoring, perf_event__synthesize_thread_map. It doesn't have performance issue. Committer testing: # getconf _NPROCESSORS_ONLN 4 # perf trace --no-inherit -e clone -o /tmp/output perf top # tail -4 /tmp/bla 0.124 ( 0.041 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3eb3a8f30, parent_tidptr: 0x7fc3eb3a99d0, child_tidptr: 0x7fc3eb3a99d0, tls: 0x7fc3eb3a9700) = 9548 (perf) 0.246 ( 0.023 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3eaba7f30, parent_tidptr: 0x7fc3eaba89d0, child_tidptr: 0x7fc3eaba89d0, tls: 0x7fc3eaba8700) = 9549 (perf) 0.286 ( 0.019 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3ea3a6f30, parent_tidptr: 0x7fc3ea3a79d0, child_tidptr: 0x7fc3ea3a79d0, tls: 0x7fc3ea3a7700) = 9550 (perf) 246.540 ( 0.047 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3ea3a6f30, parent_tidptr: 0x7fc3ea3a79d0, child_tidptr: 0x7fc3ea3a79d0, tls: 0x7fc3ea3a7700) = 9551 (perf) # Signed-off-by: Kan Liang <kan.liang@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: He Kuang <hekuang@huawei.com> Cc: Lukasz Odzioba <lukasz.odzioba@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1506696477-146932-4-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-03perf tools: Lock to protect comm_str rb treeKan Liang1-1/+17
Add comm_str_lock to protect comm_str rb tree. The lock is only needed for multithreaded code, so using mutex wrappers provided by perf tool. Signed-off-by: Kan Liang <kan.liang@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: He Kuang <hekuang@huawei.com> Cc: Lukasz Odzioba <lukasz.odzioba@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1506696477-146932-3-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-03perf tools: Lock to protect namespaces and comm listKan Liang2-5/+51
Add two locks to protect namespaces_list and comm_list. The lock is only needed for multithreaded code, so using mutex wrappers provided by perf tool. Not all the comm_list/namespaces_list accessing are protected, e.g. thread__exec_comm. Because the multithread code for perf top event synthesizing does not touch them. They don't need a lock. Signed-off-by: Kan Liang <kan.liang@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: He Kuang <hekuang@huawei.com> Cc: Lukasz Odzioba <lukasz.odzioba@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1506696477-146932-2-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>