summaryrefslogtreecommitdiffstats
path: root/tools/perf/builtin-record.c
AgeCommit message (Collapse)AuthorFilesLines
2020-09-04perf tools: Consolidate close_control_option()'s into one functionAdrian Hunter1-11/+1
Consolidate control option fifo closing into one function. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Suggested-by: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20200903122937.25691-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-04perf record: Add 'snapshot' control commandAdrian Hunter1-7/+17
Add 'snapshot' control command to create an AUX area tracing snapshot the same as if sending SIGUSR2. The advantage of the FIFO is that access is governed by access to the FIFO. Example: $ mkfifo perf.control $ mkfifo perf.ack $ cat perf.ack & [1] 15235 $ sudo ~/bin/perf record --control fifo:perf.control,perf.ack -S -e intel_pt//u -- sleep 60 & [2] 15243 $ ps -e | grep perf 15244 pts/1 00:00:00 perf $ kill -USR2 15244 bash: kill: (15244) - Operation not permitted $ echo snapshot > perf.control ack $ Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Alexey Budankov <alexey.budankov@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20200901093758.32293-6-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-04perf tools: Add FIFO file names as alternative options to --controlAdrian Hunter1-8/+26
Enable the --control option to accept file names as an alternative to file descriptors. Example: $ mkfifo perf.control $ mkfifo perf.ack $ cat perf.ack & [1] 6808 $ perf record --control fifo:perf.control,perf.ack -- sleep 300 & [2] 6810 $ echo disable > perf.control $ Events disabled ack $ echo enable > perf.control $ Events enabled ack $ echo disable > perf.control $ Events disabled ack $ kill %2 [ perf record: Woken up 4 times to write data ] $ [ perf record: Captured and wrote 0.018 MB perf.data (7 samples) ] [1]- Done cat perf.ack [2]+ Terminated perf record --control fifo:perf.control,perf.ack -- sleep 300 $ Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Alexey Budankov <alexey.budankov@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20200902105707.11491-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-04perf tools: Consolidate --control option parsing into one functionAdrian Hunter1-20/+2
Consolidate --control option parsing into one function, in preparation for adding FIFO file name options. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Alexey Budankov <alexey.budankov@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20200901093758.32293-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-01perf record: Correct the help info of option "--no-bpf-event"Wei Li1-1/+1
The help info of option "--no-bpf-event" is wrongly described as "record bpf events", correct it. Committer testing: $ perf record -h bpf Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] --clang-opt <clang options> options passed to clang when compiling BPF scriptlets --clang-path <clang path> clang binary to use for compiling BPF scriptlets --no-bpf-event do not record bpf events $ Fixes: 71184c6ab7e6 ("perf record: Replace option --bpf-event with --no-bpf-event") Signed-off-by: Wei Li <liwei391@huawei.com> Acked-by: Song Liu <songliubraving@fb.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Li Bin <huawei.libin@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20200819031947.12115-1-liwei391@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-07perf record: Skip side-band event setup if HAVE_LIBBPF_SUPPORT is not setJin Yao1-2/+2
We received an error report that perf-record caused 'Segmentation fault' on a newly system (e.g. on the new installed ubuntu). (gdb) backtrace #0 __read_once_size (size=4, res=<synthetic pointer>, p=0x14) at /root/0-jinyao/acme/tools/include/linux/compiler.h:139 #1 atomic_read (v=0x14) at /root/0-jinyao/acme/tools/include/asm/../../arch/x86/include/asm/atomic.h:28 #2 refcount_read (r=0x14) at /root/0-jinyao/acme/tools/include/linux/refcount.h:65 #3 perf_mmap__read_init (map=map@entry=0x0) at mmap.c:177 #4 0x0000561ce5c0de39 in perf_evlist__poll_thread (arg=0x561ce68584d0) at util/sideband_evlist.c:62 #5 0x00007fad78491609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #6 0x00007fad7823c103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 The root cause is, evlist__add_bpf_sb_event() just returns 0 if HAVE_LIBBPF_SUPPORT is not defined (inline function path). So it will not create a valid evsel for side-band event. But perf-record still creates BPF side band thread to process the side-band event, then the error happpens. We can reproduce this issue by removing the libelf-dev. e.g. 1. apt-get remove libelf-dev 2. perf record -a -- sleep 1 root@test:~# ./perf record -a -- sleep 1 perf: Segmentation fault Obtained 6 stack frames. ./perf(+0x28eee8) [0x5562d6ef6ee8] /lib/x86_64-linux-gnu/libc.so.6(+0x46210) [0x7fbfdc65f210] ./perf(+0x342e74) [0x5562d6faae74] ./perf(+0x257e39) [0x5562d6ebfe39] /lib/x86_64-linux-gnu/libpthread.so.0(+0x9609) [0x7fbfdc990609] /lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7fbfdc73b103] Segmentation fault (core dumped) To fix this issue, 1. We either install the missing libraries to let HAVE_LIBBPF_SUPPORT be defined. e.g. apt-get install libelf-dev and install other related libraries. 2. Use this patch to skip the side-band event setup if HAVE_LIBBPF_SUPPORT is not set. Committer notes: The side band thread is not used just with BPF, it is also used with --switch-output-event, so narrow the ifdef to the BPF specific part. Fixes: 23cbb41c939a ("perf record: Move side band evlist setup to separate routine") Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200805022937.29184-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-06perf tools: Move clockid_res_ns under clock structJiri Olsa1-3/+3
Move the clockid_res_ns struct member to the clock struct, so we have the clock related stuff in one place. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Geneviève Bastien <gbastien@versatic.net> Cc: Ian Rogers <irogers@google.com> Cc: Jeremie Galarneau <jgalar@efficios.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lore.kernel.org/lkml/20200805093444.314999-5-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-06perf header: Store clock references for -k/--clockid optionJiri Olsa1-0/+41
Add a new CLOCK_DATA feature that stores reference times when -k/--clockid option is specified. It contains the clock id and its reference time together with wall clock time taken at the 'same time', both values are in nanoseconds. The format of data is as below: struct { u32 version; /* version = 1 */ u32 clockid; u64 wall_clock_ns; u64 clockid_time_ns; }; This clock reference times will be used in following changes to display wall clock for perf events. It's available only for recording with clockid specified, because it's the only case where we can get reference time to wallclock time. It's can't do that with perf clock yet. Committer testing: $ perf record -h -k Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] -k, --clockid <clockid> clockid to use for events, see clock_gettime() $ perf record -k monotonic sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.017 MB perf.data (8 samples) ] $ perf report --header-only | grep clockid -A1 # event : name = cycles:u, , id = { 88815, 88816, 88817, 88818, 88819, 88820, 88821, 88822 }, size = 120, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|PERIOD, read_format = ID, disabled = 1, inherit = 1, exclude_kernel = 1, mmap = 1, comm = 1, freq = 1, enable_on_exec = 1, task = 1, precise_ip = 3, sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1, use_clockid = 1, ksymbol = 1, bpf_event = 1, clockid = 1 # CPU_TOPOLOGY info available, use -I to display -- # clockid frequency: 1000 MHz # cpu pmu capabilities: branches=32, max_precise=3, pmu_name=skylake # clockid: monotonic (1) # reference time: 2020-08-06 09:40:21.619290 = 1596717621.619290 (TOD) = 21931.077673635 (monotonic) $ Original-patch-by: David Ahern <dsahern@gmail.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Geneviève Bastien <gbastien@versatic.net> Cc: Ian Rogers <irogers@google.com> Cc: Jeremie Galarneau <jgalar@efficios.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lore.kernel.org/lkml/20200805093444.314999-4-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-06perf clockid: Move parse_clockid() to new clockid objectJiri Olsa1-97/+1
Move parse_clockid and all needed clcckid related stuff into clockid object. We are going to add clockid_name function in following change, so it's better it's placed in separated object and not in builtin-record.c. No functional change is intended. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Geneviève Bastien <gbastien@versatic.net> Cc: Ian Rogers <irogers@google.com> Cc: Jeremie Galarneau <jgalar@efficios.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lore.kernel.org/lkml/20200805093444.314999-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-04perf record: Introduce --control fd:ctl-fd[,ack-fd] optionsAlexey Budankov1-0/+37
Introduce --control fd:ctl-fd[,ack-fd] options to pass open file descriptors numbers from command line. Extend perf-record.txt file with --control fd:ctl-fd[,ack-fd] options description. Document possible usage model introduced by --control fd:ctl-fd[,ack-fd] options by providing example bash shell script. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/8dc01e1a-3a80-3f67-5385-4bc7112b0dd3@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-04perf record: Implement control commands handlingAlexey Budankov1-0/+16
Implement handling of 'enable' and 'disable' control commands coming from control file descriptor. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/f0fde590-1320-dca1-39ff-da3322704d3b@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-04perf record: Extend -D,--delay option with -1 valueAlexey Budankov1-4/+8
Extend -D,--delay option with -1 to start collection with events disabled to be enabled later by 'enable' command provided via control file descriptor. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/3e7d362c-7973-ee5d-e81e-c60ea22432c3@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-07-10perf tools: Add support for PERF_RECORD_TEXT_POKEAdrian Hunter1-0/+45
Add processing for PERF_RECORD_TEXT_POKE events. When a text poke event is processed, then the kernel dso data cache is updated with the poked bytes. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: x86@kernel.org Link: http://lore.kernel.org/lkml/20200512121922.8997-12-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-07-08Merge remote-tracking branch 'torvalds/master' into perf/coreArnaldo Carvalho de Melo1-9/+9
To pick up fixes and move perf/core forward, minor conflict as perf_evlist__add_dummy() lost its 'perf_' prefix as it operates on a 'struct evlist', not on a 'struct perf_evlist', i.e. its tools/perf/ specific, it is not in libperf. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-07-03perf record: Fix duplicated sideband events with Intel PT system wide tracingAdrian Hunter1-9/+9
Commit 0a892c1c9472 ("perf record: Add dummy event during system wide synthesis") reveals an issue with Intel PT system wide tracing. Specifically that Intel PT already adds a dummy tracking event, and it is not the first event. Adding another dummy tracking event causes duplicated sideband events. Fix by checking for an existing dummy tracking event first. Example showing duplicated switch events: Before: # perf record -a -e intel_pt//u uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.895 MB perf.data ] # perf script --no-itrace --show-switch-events | head swapper 0 [007] 6390.516222: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt next pid/tid: 11/11 swapper 0 [007] 6390.516222: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt next pid/tid: 11/11 rcu_sched 11 [007] 6390.516223: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 0/0 rcu_sched 11 [007] 6390.516224: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 0/0 rcu_sched 11 [007] 6390.516227: PERF_RECORD_SWITCH_CPU_WIDE OUT next pid/tid: 0/0 rcu_sched 11 [007] 6390.516227: PERF_RECORD_SWITCH_CPU_WIDE OUT next pid/tid: 0/0 swapper 0 [007] 6390.516228: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 11/11 swapper 0 [007] 6390.516228: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 11/11 swapper 0 [002] 6390.516415: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt next pid/tid: 5556/5559 swapper 0 [002] 6390.516416: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt next pid/tid: 5556/5559 After: # perf record -a -e intel_pt//u uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.868 MB perf.data ] # perf script --no-itrace --show-switch-events | head swapper 0 [005] 6450.567013: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt next pid/tid: 7179/7181 perf 7181 [005] 6450.567014: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 0/0 perf 7181 [005] 6450.567028: PERF_RECORD_SWITCH_CPU_WIDE OUT next pid/tid: 0/0 swapper 0 [005] 6450.567029: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 7179/7181 swapper 0 [005] 6450.571699: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt next pid/tid: 11/11 rcu_sched 11 [005] 6450.571700: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 0/0 rcu_sched 11 [005] 6450.571702: PERF_RECORD_SWITCH_CPU_WIDE OUT next pid/tid: 0/0 swapper 0 [005] 6450.571703: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 11/11 swapper 0 [005] 6450.579703: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt next pid/tid: 11/11 rcu_sched 11 [005] 6450.579704: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 0/0 Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20200629091955.17090-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-06-22perf evlist: Fix the class prefix for 'struct evlist' sample_id_all methodsArnaldo Carvalho de Melo1-1/+1
To differentiate from libperf's 'struct perf_evlist' methods. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-06-22perf evlist: Fix the class prefix for 'struct evlist' 'add' evsel methodsArnaldo Carvalho de Melo1-2/+2
To differentiate from libperf's 'struct perf_evlist' methods. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-29perf tools: Add optional support for libpfm4Stephane Eranian1-0/+6
This patch links perf with the libpfm4 library if it is available and LIBPFM4 is passed to the build. The libpfm4 library contains hardware event tables for all processors supported by perf_events. It is a helper library that helps convert from a symbolic event name to the event encoding required by the underlying kernel interface. This library is open-source and available from: http://perfmon2.sf.net. With this patch, it is possible to specify full hardware events by name. Hardware filters are also supported. Events must be specified via the --pfm-events and not -e option. Both options are active at the same time and it is possible to mix and match: $ perf stat --pfm-events inst_retired:any_p:c=1:i -e cycles .... One needs to explicitely ask for its inclusion by using the LIBPFM4 make command line option, ie its opt-in rather than opt-out of feature detection and build support. Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrii Nakryiko <andriin@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Igor Lubashev <ilubashe@akamai.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jiwei Sun <jiwei.sun@windriver.com> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: netdev@vger.kernel.org Cc: yuzhoujian <yuzhoujian@didichuxing.com> Link: http://lore.kernel.org/lkml/20200505182943.218248-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-28perf record: Respect --no-switch-eventsAdrian Hunter1-2/+3
Context switch events are added automatically by Intel PT and Coresight. Make it possible to suppress them. That is useful for tracing the scheduler without the disturbance that the switch event processing creates. Example: Prerequisites: $ which perf ~/bin/perf $ sudo setcap "cap_sys_rawio,cap_sys_admin,cap_sys_ptrace,cap_syslog,cap_ipc_lock=ep" ~/bin/perf $ sudo chmod +r /proc/kcore Before: $ perf record --no-switch-events --kcore -a -e intel_pt//k -- sleep 0.001 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.938 MB perf.data ] $ perf script -D | grep PERF_RECORD_SWITCH | wc -l 572 After: $ perf record --no-switch-events --kcore -a -e intel_pt//k -- sleep 0.001 Warning: Intel Processor Trace decoding will not be possible except for kernel tracing! [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.838 MB perf.data ] $ perf script -D | grep PERF_RECORD_SWITCH | wc -l 0 $ sudo chmod go-r /proc/kcore $ sudo setcap -r ~/bin/perf Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Link: http://lore.kernel.org/lkml/20200528120859.21604-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-28perf record: Use an eventfd to wakeup when doneAnand K Mistry1-0/+39
The setting and checking of 'done' contains a rare race where the signal handler setting 'done' is run after checking to break the loop, but before waiting in evlist__poll(). In this case, the main loop won't wake up until either another signal is sent, or the perf data fd causes a wake up. The following simple script can trigger this condition (but you might need to run it for several hours): for ((i = 0; i >= 0; i++)) ; do echo "Loop $i" delay=$(echo "scale=4; 0.1 * $RANDOM/32768" | bc) ./perf record -- sleep 30000000 >/dev/null& pid=$! sleep $delay kill -TERM $pid echo "PID $pid" wait $pid done At some point, the loop will stall. Adding logging, even though perf has received the SIGTERM and set 'done = 1', perf will remain sleeping until a second signal is sent. Committer notes: Make this dependent on HAVE_EVENTFD_SUPPORT, so that we continue building on older systems without the eventfd syscall. Signed-off-by: Anand K Mistry <amistry@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200513122012.v3.1.I4d7421c6bbb1f83ea58419082481082e19097841@changeid Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-28perf record: Add dummy event during system wide synthesisIan Rogers1-5/+14
During the processing of /proc during event synthesis new processes may start. Add a dummy event if /proc is to be processed, to capture mmaps for starting processes. This reuses the existing logic for initial-delay. v3 fixes the attr test of test-record-C0 v2 fixes the dummy event configuration and a branch stack issue. Suggested-by: Stephane Eranian <eranian@google.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200422173615.59436-1-irogers@google.com [ split from a larger patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05perf evsel: Rename perf_evsel__group_idx() to evsel__group_idx()Arnaldo Carvalho de Melo1-2/+1
As it is a 'struct evsel' method, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05perf evsel: Rename perf_evsel__fallback() to evsel__fallback()Arnaldo Carvalho de Melo1-1/+1
As it is a 'struct evsel' method, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05perf evsel: Rename *perf_evsel__*name() to *evsel__*name()Arnaldo Carvalho de Melo1-1/+1
As they are 'struct evsel' methods or related routines, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05perf record: Move side band evlist setup to separate routineArnaldo Carvalho de Melo1-30/+41
It is quite big by now, move that code to a separate record__setup_sb_evlist() routine. Suggested-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Link: http://lore.kernel.org/lkml/20200429131106.27974-9-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05perf record: Introduce --switch-output-eventArnaldo Carvalho de Melo1-4/+37
Now we can use it with --overwrite to have a flight recorder mode that gets snapshot requests from arbitrary events that are processed in the side band thread together with the PERF_RECORD_BPF_EVENT processing. Example: To collect scheduler events until a recvmmsg syscall happens, system wide: [root@five a]# rm -f perf.data.2020042717* [root@five a]# perf record --overwrite -e sched:*switch,syscalls:*recvmmsg --switch-output-event syscalls:sys_enter_recvmmsg [ perf record: dump data: Woken up 1 times ] [ perf record: Dump perf.data.2020042717585458 ] [ perf record: dump data: Woken up 1 times ] [ perf record: Dump perf.data.2020042717590235 ] [ perf record: dump data: Woken up 1 times ] [ perf record: Dump perf.data.2020042717590398 ] ^C[ perf record: Woken up 1 times to write data ] [ perf record: Dump perf.data.2020042717590511 ] [ perf record: Captured and wrote 7.244 MB perf.data.<timestamp> ] So in the above case we had 3 snapshots, the fourth was forced by control+C: [root@five a]# ls -la total 20440 drwxr-xr-x. 2 root root 4096 Apr 27 17:59 . dr-xr-x---. 12 root root 4096 Apr 27 17:46 .. -rw-------. 1 root root 3936125 Apr 27 17:58 perf.data.2020042717585458 -rw-------. 1 root root 5074869 Apr 27 17:59 perf.data.2020042717590235 -rw-------. 1 root root 4291037 Apr 27 17:59 perf.data.2020042717590398 -rw-------. 1 root root 7617037 Apr 27 17:59 perf.data.2020042717590511 [root@five a]# One can make this more precise by adding the switch output event to the main -e events list, as since this is done asynchronously, a few events after the signal event will appear in the snapshots, as can be seen with: [root@five a]# rm -f perf.data.20200427175* [root@five a]# perf record --overwrite -e sched:*switch,syscalls:*recvmmsg --switch-output-event syscalls:sys_enter_recvmmsg [ perf record: dump data: Woken up 1 times ] [ perf record: Dump perf.data.2020042718024203 ] [ perf record: dump data: Woken up 1 times ] [ perf record: Dump perf.data.2020042718024301 ] [ perf record: dump data: Woken up 1 times ] [ perf record: Dump perf.data.2020042718024484 ] ^C[ perf record: Woken up 1 times to write data ] [ perf record: Dump perf.data.2020042718024562 ] [ perf record: Captured and wrote 7.337 MB perf.data.<timestamp> ] [root@five a]# perf script -i perf.data.2020042718024203 | tail -15 PacerThread 148586 [005] 122.830729: sched:sched_switch: prev_comm=PacerThread prev_pid=148586... swapper 0 [000] 122.833588: sched:sched_switch: prev_comm=swapper/0 prev_pid=... NetworkManager 1251 [000] 122.833619: syscalls:sys_enter_recvmmsg: fd: 0x0000001c, mmsg: 0x7ffe83054a1... swapper 0 [002] 122.833624: sched:sched_switch: prev_comm=swapper/2 prev_pid=... swapper 0 [003] 122.833624: sched:sched_switch: prev_comm=swapper/3 prev_pid=... NetworkManager 1251 [000] 122.833626: syscalls:sys_exit_recvmmsg: 0x1 kworker/3:3-eve 158946 [003] 122.833628: sched:sched_switch: prev_comm=kworker/3:3 prev_pid=15894... swapper 0 [004] 122.833641: sched:sched_switch: prev_comm=swapper/4 prev_pid=... NetworkManager 1251 [000] 122.833642: sched:sched_switch: prev_comm=NetworkManage... perf 228273 [002] 122.833645: sched:sched_switch: prev_comm=perf prev_pid=22827... swapper 0 [011] 122.833646: sched:sched_switch: prev_comm=swapper/1... swapper 0 [002] 122.833648: sched:sched_switch: prev_comm=swapper/... kworker/0:2-eve 207387 [000] 122.833648: sched:sched_switch: prev_comm=kworker/0:2 prev_pid=20738... kworker/2:3-eve 232038 [002] 122.833652: sched:sched_switch: prev_comm=kworker/2:3 prev_pid=23203... perf 235825 [003] 122.833653: sched:sched_switch: prev_comm=perf prev_pid=23582... [root@five a]# Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lore.kernel.org/lkml/20200429131106.27974-8-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05perf bpf: Decouple creating the evlist from adding the SB eventArnaldo Carvalho de Melo1-3/+14
Renaming bpf_event__add_sb_event() to evlist__add_sb_event() and requiring that the evlist be allocated beforehand. This will allow using the same side band thread and evlist to be used for multiple purposes in addition to react to PERF_RECORD_BPF_EVENT soon after they are generated. Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Link: http://lore.kernel.org/lkml/20200429131106.27974-4-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05perf record: Move sb_evlist to 'struct record'Arnaldo Carvalho de Melo1-4/+4
Where state related to a 'perf record' session is grouped. Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Link: http://lore.kernel.org/lkml/20200429131106.27974-2-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05perf tools: Move routines that probe for perf API features to separate fileArnaldo Carvalho de Melo1-0/+1
Trying to disentangle this a bit further, unfortunately it uses parse_events(), its interesting to have it separated anyway, so do it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-23perf record: Add num-synthesize-threads optionStephane Eranian1-2/+32
To control degree of parallelism of the synthesize_mmap() code which is scanning /proc/PID/task/PID/maps and can be time consuming. Mimic perf top way of handling the option. If not specified will default to 1 thread, i.e. default behavior before this option. On a desktop computer the processing of /proc/PID/task/PID/maps isn't slow enough to warrant parallel processing and the thread creation has some cost - hence the default of 1. On a loaded server with >100 cores it is possible to see synthesis times in the order of seconds and in this case having the option is desirable. As the processing is a synchronization point, it is legitimate to worry if Amdahl's law will apply to this patch. Profiling with this patch in place: https://lore.kernel.org/lkml/20200415054050.31645-4-irogers@google.com/ shows: ... - 32.59% __perf_event__synthesize_threads - 32.54% __event__synthesize_thread + 22.13% perf_event__synthesize_mmap_events + 6.68% perf_event__get_comm_ids.constprop.0 + 1.49% process_synthesized_event + 1.29% __GI___readdir64 + 0.60% __opendir ... That is the processing is 1.49% of execution time and there is plenty to make parallel. This is shown in the benchmark in this patch: https://lore.kernel.org/lkml/20200415054050.31645-2-irogers@google.com/ Computing performance of multi threaded perf event synthesis by synthesizing events on CPU 0: Number of synthesis threads: 1 Average synthesis took: 127729.000 usec (+- 3372.880 usec) Average num. events: 21548.600 (+- 0.306) Average time per event 5.927 usec Number of synthesis threads: 2 Average synthesis took: 88863.500 usec (+- 385.168 usec) Average num. events: 21552.800 (+- 0.327) Average time per event 4.123 usec Number of synthesis threads: 3 Average synthesis took: 83257.400 usec (+- 348.617 usec) Average num. events: 21553.200 (+- 0.327) Average time per event 3.863 usec Number of synthesis threads: 4 Average synthesis took: 75093.000 usec (+- 422.978 usec) Average num. events: 21554.200 (+- 0.200) Average time per event 3.484 usec Number of synthesis threads: 5 Average synthesis took: 64896.600 usec (+- 353.348 usec) Average num. events: 21558.000 (+- 0.000) Average time per event 3.010 usec Number of synthesis threads: 6 Average synthesis took: 59210.200 usec (+- 342.890 usec) Average num. events: 21560.000 (+- 0.000) Average time per event 2.746 usec Number of synthesis threads: 7 Average synthesis took: 54093.900 usec (+- 306.247 usec) Average num. events: 21562.000 (+- 0.000) Average time per event 2.509 usec Number of synthesis threads: 8 Average synthesis took: 48938.700 usec (+- 341.732 usec) Average num. events: 21564.000 (+- 0.000) Average time per event 2.269 usec Where average time per synthesized event goes from 5.927 usec with 1 thread to 2.269 usec with 8. This isn't a linear speed up as not all of synthesize code has been made parallel. If the synthesis time was about 10 seconds then using 8 threads may bring this down to less than 4. Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tony Jones <tonyj@suse.de> Cc: yuzhoujian <yuzhoujian@didichuxing.com> Link: http://lore.kernel.org/lkml/20200422155038.9380-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-03perf record: Add --all-cgroups optionNamhyung Kim1-0/+11
The --all-cgroups option is to enable cgroup profiling support. It tells kernel to record CGROUP events in the ring buffer so that perf report can identify task/cgroup association later. [root@seventh ~]# perf record --all-cgroups --namespaces /wb/cgtest [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.042 MB perf.data (558 samples) ] [root@seventh ~]# perf report --stdio -s cgroup_id,cgroup,pid # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 558 of event 'cycles' # Event count (approx.): 458017341 # # Overhead cgroup id (dev/inode) Cgroup Pid:Command # ........ ..................... .......... ............... # 33.15% 4/0xeffffffb /sub 9615:looper0 32.83% 4/0xf00002f5 /sub/cgrp2 9620:looper2 32.79% 4/0xf00002f4 /sub/cgrp1 9619:looper1 0.35% 4/0xf00002f5 /sub/cgrp2 9618:cgtest 0.34% 4/0xf00002f4 /sub/cgrp1 9617:cgtest 0.32% 4/0xeffffffb / 9615:looper0 0.11% 4/0xeffffffb /sub 9617:cgtest 0.10% 4/0xeffffffb /sub 9618:cgtest # # (Tip: Sample related events with: perf record -e '{cycles,instructions}:S') # [root@seventh ~]# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200325124536.2800725-8-namhyung@kernel.org Link: http://lore.kernel.org/lkml/20200402015249.3800462-1-namhyung@kernel.org [ Extracted the HAVE_FILE_HANDLE from the followup patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-03perf record: Support synthesizing cgroup eventsNamhyung Kim1-0/+5
Synthesize cgroup events by iterating cgroup filesystem directories. The cgroup event only saves the portion of cgroup path after the mount point and the cgroup id (which actually is a file handle). Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200325124536.2800725-7-namhyung@kernel.org Link: http://lore.kernel.org/lkml/20200402015249.3800462-1-namhyung@kernel.org [ Extracted the HAVE_FILE_HANDLE from the followup patch, added missing __maybe_unused ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-01-06perf record: Adapt affinity to machines with #CPUs > 1KAlexey Budankov1-6/+22
Use struct mmap_cpu_mask type for the tool's thread and mmap data buffers to overcome current 1024 CPUs mask size limitation of cpu_set_t type. Currently glibc's cpu_set_t type has an internal mask size limit of 1024 CPUs. Moving to the 'struct mmap_cpu_mask' type allows overcoming that limit. The tools bitmap API is used to manipulate objects of 'struct mmap_cpu_mask' type. Committer notes: To print the 'nbits' struct member we must use %zd, since it is a size_t, this fixes the build in some toolchains/arches. Reported-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/96d7e2ff-ce8b-c1e0-d52c-aa59ea96f0ea@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-29perf stat: Use affinity for opening eventsAndi Kleen1-1/+1
Restructure the event opening in perf stat to cycle through the events by CPU after setting affinity to that CPU. This eliminates IPI overhead in the perf API. We have to loop through the CPU in the outter builtin-stat code instead of leaving that to low level functions. It has to change the weak group fallback strategy slightly. Since we cannot easily undo the opens for other CPUs move the weak group retry to a separate loop. Before with a large test case with 94 CPUs: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 42.75 4.050910 67 60046 110 perf_event_open After: 26.86 0.944396 16 58069 110 perf_event_open (the number changes slightly because the weak group retries work differently and the test case relies on weak groups) Committer notes: Added one of the hunks in a patch provided by Andi after I noticed that the "event times" 'perf test' entry was segfaulting. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20191121001522.180827-10-andi@firstfloor.org Link: http://lore.kernel.org/lkml/20191127232657.GL84886@tassilo.jf.intel.com # Fix Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-22perf record: Add support for AUX area samplingAdrian Hunter1-1/+20
Add a 'perf record' option '--aux-sample' to request AUX area sampling. AUX area sampling uses an overwriting buffer much like snapshot mode, so adjust the AUX buffer mmapping accordingly. To make it easy to queue samples for decoding, synthesize an ID index. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191115124225.5247-7-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-18perf record: No need to process the synthesized MMAP events twiceArnaldo Carvalho de Melo1-2/+27
At the end of a 'perf record' session, by default, we'll process all samples and populate the threads, maps, etc so as to find out which of the DSOs got samples, to reduce the size of the build-id table we'll add to the perf.data headers. But we don't need to process the PERF_RECORD_MMAP events synthesized for the kernel modules, as we have those already via perf_session__create_kernel_maps(), so add mmap/mmap2 handlers that first look at event->header.misc to see if the event is for a user map, bailing out if not. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-mofoxvcx2dryppcw3o689jdd@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-07perf record: Add support for limit perf output file sizeJiwei Sun1-1/+45
The patch adds a new option to limit the output file size, then based on it, we can create a wrapper of the perf command that uses the option to avoid exhausting the disk space by the unconscious user. In order to make the perf.data parsable, we just limit the sample data size, since the perf.data consists of many headers and sample data and other data, the actual size of the recorded file will bigger than the setting value. Testing it: # ./perf record -a -g --max-size=10M Couldn't synthesize bpf events. [ perf record: perf size limit reached (10249 KB), stopping session ] [ perf record: Woken up 32 times to write data ] [ perf record: Captured and wrote 10.133 MB perf.data (71964 samples) ] # ls -lh perf.data -rw------- 1 root root 11M Oct 22 14:32 perf.data # ./perf record -a -g --max-size=10K [ perf record: perf size limit reached (10 KB), stopping session ] Couldn't synthesize bpf events. [ perf record: Woken up 0 times to write data ] [ perf record: Captured and wrote 1.546 MB perf.data (69 samples) ] # ls -l perf.data -rw------- 1 root root 1626952 Oct 22 14:36 perf.data Committer notes: Fixed the build in multiple distros by using PRIu64 to print u64 struct members, fixing this: builtin-record.c: In function 'record__write': builtin-record.c:150:5: error: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'u64' [-Werror=format=] rec->bytes_written >> 10); ^ CC /tmp/build/pe Signed-off-by: Jiwei Sun <jiwei.sun@windriver.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Danter <richard.danter@windriver.com> Link: http://lore.kernel.org/lkml/20191022080901.3841-1-jiwei.sun@windriver.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-06perf record: Put a copy of kcore into the perf.data directoryAdrian Hunter1-0/+52
Add a new 'perf record' option '--kcore' which will put a copy of /proc/kcore, kallsyms and modules into a perf.data directory. Note, that without the --kcore option, output goes to a file as previously. The tools' -o and -i options work with either a file name or directory name. Example: $ sudo perf record --kcore uname $ sudo tree perf.data perf.data ├── kcore_dir │   ├── kallsyms │   ├── kcore │   └── modules └── data $ sudo perf script -v build id event received for vmlinux: 1eaa285996affce2d74d8e66dcea09a80c9941de build id event received for [vdso]: 8bbaf5dc62a9b644b4d4e4539737e104e4a84541 Samples for 'cycles' event do not have CPU attribute set. Skipping 'cpu' field. Using CPUID GenuineIntel-6-8E-A Using perf.data/kcore_dir/kcore for kernel data Using perf.data/kcore_dir/kallsyms for symbols perf 19058 506778.423729: 1 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux) perf 19058 506778.423733: 1 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux) perf 19058 506778.423734: 7 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux) perf 19058 506778.423736: 117 cycles: ffffffffa2caa54a native_write_msr+0xa (vmlinux) perf 19058 506778.423738: 2092 cycles: ffffffffa2c9b7b0 native_apic_msr_write+0x0 (vmlinux) perf 19058 506778.423740: 37380 cycles: ffffffffa2f121d0 perf_event_addr_filters_exec+0x0 (vmlinux) uname 19058 506778.423751: 582673 cycles: ffffffffa303a407 propagate_protected_usage+0x147 (vmlinux) uname 19058 506778.423892: 2241841 cycles: ffffffffa2cae0c9 unwind_next_frame.part.5+0x79 (vmlinux) uname 19058 506778.424430: 2457397 cycles: ffffffffa3019232 check_memory_region+0x52 (vmlinux) Committer testing: # rm -rf perf.data* # perf record sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.024 MB perf.data (7 samples) ] # ls -l perf.data -rw-------. 1 root root 34772 Oct 21 11:08 perf.data # perf record --kcore uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.024 MB perf.data (7 samples) ] ls[root@quaco ~]# ls -lad perf.data* drwx------. 3 root root 4096 Oct 21 11:08 perf.data -rw-------. 1 root root 34772 Oct 21 11:08 perf.data.old # perf evlist -v cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 # perf evlist -v -i perf.data/data cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 # Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: http://lore.kernel.org/lkml/20191004083121.12182-6-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-06perf data: Support single perf.data file directoryAdrian Hunter1-1/+1
Support directory output that contains a regular perf.data file, named "data". By default the directory is named perf.data i.e. perf.data └── data Most of the infrastructure to support a directory is already there. This patch makes the changes needed to support the format above. Presently there is no 'perf record' option to output a directory. This is preparation for adding support for putting a copy of /proc/kcore in the directory. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20191004083121.12182-5-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-10-10libperf: Adopt perf_mmap__put() function from tools/perfJiri Olsa1-2/+2
Move perf_mmap__put() from tools/perf to libperf. Once perf_mmap__put() is moved, we need a way to call application related unmap code (AIO and aux related code for eprf), when the map goes away. Add the perf_mmap::unmap callback to do that. The unmap path from perf is: perf_mmap__put (libperf) perf_mmap__munmap (libperf) map->unmap_cb -> perf_mmap__unmap_cb (perf) mmap__munmap (perf) Committer notes: Add missing linux/kernel.h to tools/perf/lib/mmap.c to get the BUG_ON definition. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191007125344.14268-8-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-10-10libperf: Adopt perf_mmap__get() function from tools/perfJiri Olsa1-1/+1
Move perf_mmap__get() from tools/perf to libperf in the internal header internal/mmap.h. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191007125344.14268-6-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-10-10libperf: Adopt perf_mmap__mmap_len() function from tools/perfJiri Olsa1-2/+2
Move perf_mmap__mmap_len() from tools/perf wto libperf, it will be used in the following patches. And rename the existing perf's function to mmap__mmap_len(). Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191007125344.14268-4-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-25libperf: Add perf_evlist__poll() functionJiri Olsa1-1/+1
Move perf_evlist__poll() from tools/perf to libperf, it will be used in the following patches. And rename the existing perf's function to evlist__poll(). Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lore.kernel.org/lkml/20190913132355.21634-39-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-25libperf: Add perf_evlist__add_pollfd() functionJiri Olsa1-1/+1
Move perf_evlist__add_pollfd() from tools/perf to libperf, it will be used in the following patches. Also rename perf's perf_evlist__add_pollfd()/perf_evlist__filter_pollfd() to evlist__add_pollfd()/evlist__filter_pollfd(). Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lore.kernel.org/lkml/20190913132355.21634-38-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-25libperf: Add perf_evlist__first()/last() functionsJiri Olsa1-2/+2
Add perf_evlist__first()/last() functions to libperf, as internal functions and rename perf's origins to evlist__first/last. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lore.kernel.org/lkml/20190913132355.21634-29-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-25libperf: Move 'mmap_len' from 'struct evlist' to 'struct perf_evlist'Jiri Olsa1-1/+1
Moving 'mmap_len' from 'struct evlist' to 'struct perf_evlist' it will be used in following patches. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lore.kernel.org/lkml/20190913132355.21634-22-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-25libperf: Move 'nr_mmaps' from 'struct evlist' to 'struct perf_evlist'Jiri Olsa1-3/+3
Moving 'nr_mmaps' from 'struct evlist' to 'struct perf_evlist', it will be used in following patches. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lore.kernel.org/lkml/20190913132355.21634-21-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-25libperf: Add 'flush' to 'struct perf_mmap'Jiri Olsa1-5/+5
Move 'flush' from tools/perf's mmap to libperf's perf_mmap struct. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lore.kernel.org/lkml/20190913132355.21634-19-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-25libperf: Add perf_mmap structJiri Olsa1-7/+7
Add the perf_mmap struct to libperf. The definition is added into: include/internal/mmap.h which is not to be included by users, but shared within perf and libperf. Committer notes: Remove unnecessary includes from tools/perf/lib/include/internal/mmap.h, those will be readded as they become necessary, later in the series. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lore.kernel.org/lkml/20190913132355.21634-11-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-25perf evlist: Adopt backwards ring buffer state enumArnaldo Carvalho de Melo1-0/+1
As this isn't used at all in mmap.h but in evlist.h, so to cut down the header dependency tree, move it to where it is used. Also add mmap.h to the places using it but previously getting it indirectly via evlist.h. Add missing pthread.h to evlist.h, as it has a pthread_t struct member and was getting the header via mmap.h. Noticed while processing a Jiri's libperf batch touching mmap.h, where almost everything gets rebuilt because evlist.h is so popular, so cut down't this rebuild the world party. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Link: https://lkml.kernel.org/n/tip-he0uljeftl0xfveh3d6vtode@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>