summaryrefslogtreecommitdiffstats
path: root/tools/perf/builtin-stat.c
AgeCommit message (Collapse)AuthorFilesLines
2022-11-16perf thread_map: Reduce exposure of libperf internal APIIan Rogers1-0/+1
Remove unnecessary include of internal threadmap.h and refcount.h in thread_map.h. Switch to using public APIs when possible or including the internal header file in the C file. Fix a transitive dependency in openat-syscall.c broken by the clean up. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nicolas Schier <nicolas@fjasle.eu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: bpf@vger.kernel.org Link: http://lore.kernel.org/lkml/20221109184914.1357295-13-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-11-16perf stat: Clear screen only if output file is a ttyNamhyung Kim1-0/+8
The --interval-clear option makes perf stat to clear the terminal at each interval. But it doesn't need to clear the screen when it saves to a file. Make it fail when it's enabled with the output options. $ perf stat -I 1 --interval-clear -o myfile true --interval-clear does not work with output Usage: perf stat [<options>] [<command>] -o, --output <file> output file name --log-fd <n> log output to fd, instead of stderr --interval-clear clear screen in between new interval Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20221114230227.1255976-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-11-03perf stat: Use sig_atomic_t to avoid undefined behaviour in a signal handlerIan Rogers1-4/+4
Use sig_atomic_t for variables written/accessed in signal handlers. This is undefined behavior as per: https://wiki.sei.cmu.edu/confluence/display/c/SIG31-C.+Do+not+access+shared+objects+in+signal+handlers Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20221024181913.630986-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf tools: Make quiet mode consistent between toolsJames Clark1-4/+4
Use the global quiet variable everywhere so that all tools hide warnings in quiet mode and update the documentation to reflect this. 'perf probe' claimed that errors are not printed in quiet mode but I don't see this so remove it from the docs. Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20221018094137.783081-3-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf stat: Display percore events properlyNamhyung Kim1-16/+0
The recent change in the perf stat broke the percore event display. Note that the aggr counts are already processed so that the every sibling thread in the same core will get the per-core counter values. Check percore evsels and skip the sibling threads in the display. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20221018020227.85905-20-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf stat: Add perf_stat_process_shadow_stats()Namhyung Kim1-0/+1
This function updates the shadow stats using the aggregated counts uniformly since it uses the aggr_counts for the every aggr mode. It'd have duplicate shadow stats for each items for now since the display routines will update them once again. But that'd be fine as it shows the average values and it'd be gone eventually. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20221018020227.85905-18-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf stat: Add perf_stat_process_percore()Namhyung Kim1-0/+1
The perf_stat_process_percore() is to aggregate counts for an event per-core even if the aggr_mode is AGGR_NONE. This is enabled when user requested it on the command line. To handle that, it keeps the per-cpu counts at first. And then it aggregates the counts that have the same core id in the aggr->counts and updates the values for each cpu back. Later, per-core events will skip one of the CPUs unless percore-show-thread option is given. In that case, it can simply print all cpu stats with the updated (per-core) values. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20221018020227.85905-17-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf stat: Add perf_stat_merge_counters()Namhyung Kim1-0/+2
The perf_stat_merge_counters() is to aggregate the same events in different PMUs like in case of uncore or hybrid. The same logic is in the stat-display routines but I think it should be handled when it processes the event counters. As it works on the aggr_counters, it doesn't change the output yet. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20221018020227.85905-16-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf stat: Split process_counters() to share it with process_stat_round_event()Namhyung Kim1-9/+13
It'd do more processing with aggregation. Let's split the function so that it can be shared with by process_stat_round_event() too. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20221018020227.85905-15-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf stat: Reset aggr counts for each intervalNamhyung Kim1-0/+3
The evsel->stats->aggr->count should be reset for interval processing since we want to use the values directly for display. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20221018020227.85905-14-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf stat: Allocate aggr counts for recorded dataNamhyung Kim1-5/+15
In the process_stat_config_event() it sets the aggr_mode that means the earlier evlist__alloc_stats() cannot allocate the aggr counts due to the missing aggr_mode. Do it after setting the aggr_map using evlist__alloc_aggr_stats(). Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20221018020227.85905-13-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf stat: Aggregate per-thread stats using evsel->stats->aggrNamhyung Kim1-0/+31
Per-thread aggregation doesn't use the CPU numbers but the logic should be the same. Initialize cpu_aggr_map separately for AGGR_THREAD and use thread map idx to aggregate counter values. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20221018020227.85905-12-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf stat: Aggregate events using evsel->stats->aggrNamhyung Kim1-3/+0
Add a logic to aggregate counter values to the new evsel->stats->aggr. This is not used yet so shadow stats are not updated. But later patch will convert the existing code to use it. With that, we don't need to handle AGGR_GLOBAL specially anymore. It can use the same logic with counts, prev_counts and aggr_counts. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20221018020227.85905-10-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf stat: Allocate evsel->stats->aggr properlyNamhyung Kim1-3/+3
The perf_stat_config.aggr_map should have a correct size of the aggregation map. Use it to allocate aggr_counts. Also AGGR_NONE with per-core events can be tricky because it doesn't aggreate basically but it needs to do so for per-core events only. So only per-core evsels will have stats->aggr data. Note that other caller of evlist__alloc_stat() might not have stat_config or aggr_map. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20221018020227.85905-9-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf stat: Add 'needs_sort' argument to cpu_aggr_map__new()Namhyung Kim1-2/+5
In case of no aggregation, it needs to keep the original (cpu) ordering in the aggr_map so that it can be in sync with the cpu map. This will make the code easier to handle AGGR_NONE similar to others. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20221018020227.85905-7-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf stat: Add cpu aggr id for no aggregation modeNamhyung Kim1-5/+43
Likewise, add an aggr_id for cpu for none aggregation mode. This is not used actually yet but later code will use to unify the aggregation code. No functional change intended. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20221018020227.85905-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf stat: Add aggr id for global modeNamhyung Kim1-2/+34
To make the code simpler, I'd like to use the same aggregation code for the global mode. We can simply add an id function to return cpu 0 and use print_aggr(). No functional change intended. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20221018020227.85905-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-06perf stat: Kill unused per-thread runtime statsNamhyung Kim1-54/+0
Now it's using the global rt_stat, no need to use per-thread stats. Let get rid of them. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20220930202110.845199-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-06perf stat: Clean redundant if in process_evlistShang XiaoJing1-2/+0
Since the first if statment is covered by the following one, clean up the first if statment. Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220922141438.22487-5-shangxiaojing@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-04perf stat: Merge cases in process_evlistShang XiaoJing1-3/+1
As two cases in process_evlist has same behavior, make the first fall through to the second. Commiter notes: Added __fallthrough, the kernel has "fallthrough", we need to make tools/ use it. Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220922141438.22487-3-shangxiaojing@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-04perf metrics: Wire up core_wideIan Rogers1-0/+14
Pass state necessary for core_wide into the expression parser. Add system_wide and user_requested_cpu_list to perf_stat_config to make it available at display time. evlist isn't used as the evlist__create_maps, that computes user_requested_cpus, needs the list of events which is generated by the metric. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Ahmad Yasin <ahmad.yasin@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Florian Fischer <florian.fischer@muhq.space> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kshipra Bopardikar <kshipra.bopardikar@intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miaoqian Lin <linmq006@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20220831174926.579643-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-04perf stat: Delay metric parsingIan Rogers1-15/+37
Having metric parsing as part of argument processing causes issues as flags like metric-no-group may be specified later. It also denies the opportunity to optimize the events on SMT systems where fewer events may be possible if we know the target is system-wide. Move metric parsing to after command line option parsing. Because of how stat runs this moves the parsing after record/report which fail to work with metrics currently anyway. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Ahmad Yasin <ahmad.yasin@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Florian Fischer <florian.fischer@muhq.space> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kshipra Bopardikar <kshipra.bopardikar@intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miaoqian Lin <linmq006@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20220831174926.579643-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-09-02perf stat: Fix L2 Topdown metrics disappear for raw eventsZhengjun Xing1-2/+3
In perf/Documentation/perf-stat.txt, for "--td-level" the default "0" means the max level that the current hardware support. So we need initialize the stat_config.topdown_level to TOPDOWN_MAX_LEVEL when “--td-level=0” or no “--td-level” option. Otherwise, for the hardware with a max level is 2, the 2nd level metrics disappear for raw events in this case. The issue cannot be observed for the perf stat default or "--topdown" options. This commit fixes the raw events issue and removes the duplicated code for the perf stat default. Before: # ./perf stat -e "cpu-clock,context-switches,cpu-migrations,page-faults,instructions,cycles,ref-cycles,branches,branch-misses,{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound,topdown-heavy-ops,topdown-br-mispredict,topdown-fetch-lat,topdown-mem-bound}" sleep 1 Performance counter stats for 'sleep 1': 1.03 msec cpu-clock # 0.001 CPUs utilized 1 context-switches # 966.216 /sec 0 cpu-migrations # 0.000 /sec 60 page-faults # 57.973 K/sec 1,132,112 instructions # 1.41 insn per cycle 803,872 cycles # 0.777 GHz 1,909,120 ref-cycles # 1.845 G/sec 236,634 branches # 228.640 M/sec 6,367 branch-misses # 2.69% of all branches 4,823,232 slots # 4.660 G/sec 1,210,536 topdown-retiring # 25.1% Retiring 699,841 topdown-bad-spec # 14.5% Bad Speculation 1,777,975 topdown-fe-bound # 36.9% Frontend Bound 1,134,878 topdown-be-bound # 23.5% Backend Bound 189,146 topdown-heavy-ops # 182.756 M/sec 662,012 topdown-br-mispredict # 639.647 M/sec 1,097,048 topdown-fetch-lat # 1.060 G/sec 416,121 topdown-mem-bound # 402.063 M/sec 1.002423690 seconds time elapsed 0.002494000 seconds user 0.000000000 seconds sys After: # ./perf stat -e "cpu-clock,context-switches,cpu-migrations,page-faults,instructions,cycles,ref-cycles,branches,branch-misses,{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound,topdown-heavy-ops,topdown-br-mispredict,topdown-fetch-lat,topdown-mem-bound}" sleep 1 Performance counter stats for 'sleep 1': 1.13 msec cpu-clock # 0.001 CPUs utilized 1 context-switches # 882.128 /sec 0 cpu-migrations # 0.000 /sec 61 page-faults # 53.810 K/sec 1,137,612 instructions # 1.29 insn per cycle 881,477 cycles # 0.778 GHz 2,093,496 ref-cycles # 1.847 G/sec 236,356 branches # 208.496 M/sec 7,090 branch-misses # 3.00% of all branches 5,288,862 slots # 4.665 G/sec 1,223,697 topdown-retiring # 23.1% Retiring 767,403 topdown-bad-spec # 14.5% Bad Speculation 2,053,322 topdown-fe-bound # 38.8% Frontend Bound 1,244,438 topdown-be-bound # 23.5% Backend Bound 186,665 topdown-heavy-ops # 3.5% Heavy Operations # 19.6% Light Operations 725,922 topdown-br-mispredict # 13.7% Branch Mispredict # 0.8% Machine Clears 1,327,400 topdown-fetch-lat # 25.1% Fetch Latency # 13.7% Fetch Bandwidth 497,775 topdown-mem-bound # 9.4% Memory Bound # 14.1% Core Bound 1.002701530 seconds time elapsed 0.002744000 seconds user 0.000000000 seconds sys Fixes: 63e39aa6ae103451 ("perf stat: Support L2 Topdown events") Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220826140057.3289401-1-zhengjun.xing@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-27perf stat: Clear evsel->reset_group for each stat runIan Rogers1-0/+1
If a weak group is broken then the reset_group flag remains set for the next run. Having reset_group set means the counter isn't created and ultimately a segfault. A simple reproduction of this is: # perf stat -r2 -e '{cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles}:W which will be added as a test in the next patch. Fixes: 4804e0111662d7d8 ("perf stat: Use affinity for opening events") Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20220822213352.75721-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-12perf stat: Remove duplicated include in builtin-stat.cYang Li1-1/+0
util/topdown.h is included twice in builtin-stat.c, remove one of them. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Tested-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=1818 Link: https://lore.kernel.org/r/20220804005213.71990-1-yang.lee@linux.alibaba.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-10perf stat: Add JSON output optionClaire Jensen1-0/+6
CSV output is tricky to format and column layout changes are susceptible to breaking parsers. New JSON-formatted output has variable names to identify fields that are consistent and informative, making the output parseable. CSV output example: 1.20,msec,task-clock:u,1204272,100.00,0.697,CPUs utilized 0,,context-switches:u,1204272,100.00,0.000,/sec 0,,cpu-migrations:u,1204272,100.00,0.000,/sec 70,,page-faults:u,1204272,100.00,58.126,K/sec JSON output example: {"counter-value" : "3805.723968", "unit" : "msec", "event" : "cpu-clock", "event-runtime" : 3805731510100.00, "pcnt-running" : 100.00, "metric-value" : 4.007571, "metric-unit" : "CPUs utilized"} {"counter-value" : "6166.000000", "unit" : "", "event" : "context-switches", "event-runtime" : 3805723045100.00, "pcnt-running" : 100.00, "metric-value" : 1.620191, "metric-unit" : "K/sec"} {"counter-value" : "466.000000", "unit" : "", "event" : "cpu-migrations", "event-runtime" : 3805727613100.00, "pcnt-running" : 100.00, "metric-value" : 122.447136, "metric-unit" : "/sec"} {"counter-value" : "208.000000", "unit" : "", "event" : "page-faults", "event-runtime" : 3805726799100.00, "pcnt-running" : 100.00, "metric-value" : 54.654516, "metric-unit" : "/sec"} Also added documentation for JSON option. There is some tidy up of CSV code including a potential memory over run in the os.nfields set up. To facilitate this an AGGR_MAX value is added. Committer notes: Fixed up using PRIu64 to format u64 values, not %lu. Committer testing: ⬢[acme@toolbox perf]$ perf stat -j sleep 1 {"counter-value" : "0.731750", "unit" : "msec", "event" : "task-clock:u", "event-runtime" : 731750, "pcnt-running" : 100.00, "metric-value" : 0.000731, "metric-unit" : "CPUs utilized"} {"counter-value" : "0.000000", "unit" : "", "event" : "context-switches:u", "event-runtime" : 731750, "pcnt-running" : 100.00, "metric-value" : 0.000000, "metric-unit" : "/sec"} {"counter-value" : "0.000000", "unit" : "", "event" : "cpu-migrations:u", "event-runtime" : 731750, "pcnt-running" : 100.00, "metric-value" : 0.000000, "metric-unit" : "/sec"} {"counter-value" : "75.000000", "unit" : "", "event" : "page-faults:u", "event-runtime" : 731750, "pcnt-running" : 100.00, "metric-value" : 102.494021, "metric-unit" : "K/sec"} {"counter-value" : "578765.000000", "unit" : "", "event" : "cycles:u", "event-runtime" : 379366, "pcnt-running" : 49.00, "metric-value" : 0.790933, "metric-unit" : "GHz"} {"counter-value" : "1298.000000", "unit" : "", "event" : "stalled-cycles-frontend:u", "event-runtime" : 768020, "pcnt-running" : 100.00, "metric-value" : 0.224271, "metric-unit" : "frontend cycles idle"} {"counter-value" : "21984.000000", "unit" : "", "event" : "stalled-cycles-backend:u", "event-runtime" : 768020, "pcnt-running" : 100.00, "metric-value" : 3.798433, "metric-unit" : "backend cycles idle"} {"counter-value" : "468197.000000", "unit" : "", "event" : "instructions:u", "event-runtime" : 768020, "pcnt-running" : 100.00, "metric-value" : 0.808959, "metric-unit" : "insn per cycle"} {"metric-value" : 0.046955, "metric-unit" : "stalled cycles per insn"} {"counter-value" : "103335.000000", "unit" : "", "event" : "branches:u", "event-runtime" : 768020, "pcnt-running" : 100.00, "metric-value" : 141.216262, "metric-unit" : "M/sec"} {"counter-value" : "2381.000000", "unit" : "", "event" : "branch-misses:u", "event-runtime" : 388654, "pcnt-running" : 50.00, "metric-value" : 2.304156, "metric-unit" : "of all branches"} ⬢[acme@toolbox perf]$ Signed-off-by: Claire Jensen <cjense@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alyssa Ross <hi@alyssa.is> Cc: Claire Jensen <clairej735@gmail.com> Cc: Florian Fischer <florian.fischer@muhq.space> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Like Xu <likexu@tencent.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20220805200105.2020995-2-irogers@google.com Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-03perf stat: Refactor __run_perf_stat() common codeAdrián Herrera Arcila1-16/+9
This extracts common code from the branches of the forks if-then-else. enable_counters(), which was at the beginning of both branches of the conditional, is now unconditional; evlist__start_workload() is extracted to a different if, which enables making the common clocking code unconditional. Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Adrián Herrera Arcila <adrian.herrera@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/r/20220729161244.10522-1-adrian.herrera@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-29perf stat: Add topdown metrics in the default perf stat on the hybrid machineZhengjun Xing1-12/+2
Topdown metrics are missed in the default perf stat on the hybrid machine, add Topdown metrics in default perf stat for hybrid systems. Currently, we support the perf metrics Topdown for the p-core PMU in the perf stat default, the perf metrics Topdown support for e-core PMU will be implemented later separately. Refactor the code adds two x86 specific functions. Widen the size of the event name column by 7 chars, so that all metrics after the "#" become aligned again. The perf metrics topdown feature is supported on the cpu_core of ADL. The dedicated perf metrics counter and the fixed counter 3 are used for the topdown events. Adding the topdown metrics doesn't trigger multiplexing. Before: # ./perf stat -a true Performance counter stats for 'system wide': 53.70 msec cpu-clock # 25.736 CPUs utilized 80 context-switches # 1.490 K/sec 24 cpu-migrations # 446.951 /sec 52 page-faults # 968.394 /sec 2,788,555 cpu_core/cycles/ # 51.931 M/sec 851,129 cpu_atom/cycles/ # 15.851 M/sec 2,974,030 cpu_core/instructions/ # 55.385 M/sec 416,919 cpu_atom/instructions/ # 7.764 M/sec 586,136 cpu_core/branches/ # 10.916 M/sec 79,872 cpu_atom/branches/ # 1.487 M/sec 14,220 cpu_core/branch-misses/ # 264.819 K/sec 7,691 cpu_atom/branch-misses/ # 143.229 K/sec 0.002086438 seconds time elapsed After: # ./perf stat -a true Performance counter stats for 'system wide': 61.39 msec cpu-clock # 24.874 CPUs utilized 76 context-switches # 1.238 K/sec 24 cpu-migrations # 390.968 /sec 52 page-faults # 847.097 /sec 2,753,695 cpu_core/cycles/ # 44.859 M/sec 903,899 cpu_atom/cycles/ # 14.725 M/sec 2,927,529 cpu_core/instructions/ # 47.690 M/sec 428,498 cpu_atom/instructions/ # 6.980 M/sec 581,299 cpu_core/branches/ # 9.470 M/sec 83,409 cpu_atom/branches/ # 1.359 M/sec 13,641 cpu_core/branch-misses/ # 222.216 K/sec 8,008 cpu_atom/branch-misses/ # 130.453 K/sec 14,761,308 cpu_core/slots/ # 240.466 M/sec 3,288,625 cpu_core/topdown-retiring/ # 22.3% retiring 1,323,323 cpu_core/topdown-bad-spec/ # 9.0% bad speculation 5,477,470 cpu_core/topdown-fe-bound/ # 37.1% frontend bound 4,679,199 cpu_core/topdown-be-bound/ # 31.7% backend bound 646,194 cpu_core/topdown-heavy-ops/ # 4.4% heavy operations # 17.9% light operations 1,244,999 cpu_core/topdown-br-mispredict/ # 8.4% branch mispredict # 0.5% machine clears 3,891,800 cpu_core/topdown-fetch-lat/ # 26.4% fetch latency # 10.7% fetch bandwidth 1,879,034 cpu_core/topdown-mem-bound/ # 12.7% memory bound # 19.0% Core bound 0.002467839 seconds time elapsed Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220721065706.2886112-6-zhengjun.xing@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-29perf evlist: Always use arch_evlist__add_default_attrs()Kan Liang1-1/+5
Current perf stat uses the evlist__add_default_attrs() to add the generic default attrs, and uses arch_evlist__add_default_attrs() to add the Arch specific default attrs, e.g., Topdown for x86. It works well for the non-hybrid platforms. However, for a hybrid platform, the hard code generic default attrs don't work. Uses arch_evlist__add_default_attrs() to replace the evlist__add_default_attrs(). The arch_evlist__add_default_attrs() is modified to invoke the same __evlist__add_default_attrs() for the generic default attrs. No functional change. Add default_null_attrs[] to indicate the arch specific attrs. No functional change for the arch specific default attrs either. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220721065706.2886112-4-zhengjun.xing@linux.intel.com Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-29perf stat: Revert "perf stat: Add default hybrid events"Kan Liang1-30/+0
This reverts commit Fixes: ac2dc29edd21f9ec ("perf stat: Add default hybrid events") Between this patch and the reverted patch, the commit 6c1912898ed21bef ("perf parse-events: Rename parse_events_error functions") and the commit 07eafd4e053a41d7 ("perf parse-event: Add init and exit to parse_event_error") clean up the parse_events_error_*() codes. The related change is also reverted. The reverted patch is hard to be extended to support new default events, e.g., Topdown events, and the existing "--detailed" option on a hybrid platform. A new solution will be proposed in the following patch to enable the perf stat default on a hybrid platform. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220721065706.2886112-2-zhengjun.xing@linux.intel.com Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-26perf stat: Enable ignore_missing_threadGang Li1-0/+2
perf already support ignore_missing_thread for -p, but not yet applied to `perf stat -p <pid>`. This patch enables ignore_missing_thread for `perf stat -p <pid>`. Committer notes: And here is a refresher about the 'ignore_missing_thread' knob, from a previous patch using it: ca8000684ec4e66f ("perf evsel: Enable ignore_missing_thread for pid option") --- While monitoring a multithread process with pid option, perf sometimes may return sys_perf_event_open failure with 3(No such process) if any of the process's threads die before we open the event. However, we want perf continue monitoring the remaining threads and do not exit with error. --- Signed-off-by: Gang Li <ligang.bdlg@bytedance.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220622030037.15005-1-ligang.bdlg@bytedance.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-26perf stat: Add requires_cpu flag for uncoreAdrian Hunter1-4/+1
Uncore events require a CPU i.e. it cannot be -1. The evsel system_wide flag is intended for events that should be on every CPU, which does not make sense for uncore events because uncore events do not map one-to-one with CPUs. These 2 requirements are not exactly the same, so introduce a new flag 'requires_cpu' for the uncore case. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Link: https://lore.kernel.org/r/20220524075436.29144-13-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-20perf stat: Always keep perf metrics topdown events in a groupKan Liang1-5/+2
If any member in a group has a different cpu mask than the other members, the current perf stat disables group. when the perf metrics topdown events are part of the group, the below <not supported> error will be triggered. $ perf stat -e "{slots,topdown-retiring,uncore_imc_free_running_0/dclk/}" -a sleep 1 WARNING: grouped events cpus do not match, disabling group: anon group { slots, topdown-retiring, uncore_imc_free_running_0/dclk/ } Performance counter stats for 'system wide': 141,465,174 slots <not supported> topdown-retiring 1,605,330,334 uncore_imc_free_running_0/dclk/ The perf metrics topdown events must always be grouped with a slots event as leader. Factor out evsel__remove_from_group() to only remove the regular events from the group. Remove evsel__must_be_in_group(), since no one use it anymore. With the patch, the topdown events aren't broken from the group for the splitting. $ perf stat -e "{slots,topdown-retiring,uncore_imc_free_running_0/dclk/}" -a sleep 1 WARNING: grouped events cpus do not match, disabling group: anon group { slots, topdown-retiring, uncore_imc_free_running_0/dclk/ } Performance counter stats for 'system wide': 346,110,588 slots 124,608,256 topdown-retiring 1,606,869,976 uncore_imc_free_running_0/dclk/ 1.003877592 seconds time elapsed Fixes: a9a1790247bdcf3b ("perf stat: Ensure group is defined on top of the same cpu mask") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20220518143900.1493980-3-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-24perf stat: Support hybrid --topdown optionZhengjun Xing1-3/+18
Since for cpu_core or cpu_atom, they have different topdown events groups. For cpu_core, --topdown equals to: "{slots,cpu_core/topdown-retiring/,cpu_core/topdown-bad-spec/, cpu_core/topdown-fe-bound/,cpu_core/topdown-be-bound/, cpu_core/topdown-heavy-ops/,cpu_core/topdown-br-mispredict/, cpu_core/topdown-fetch-lat/,cpu_core/topdown-mem-bound/}" For cpu_atom, --topdown equals to: "{cpu_atom/topdown-retiring/,cpu_atom/topdown-bad-spec/, cpu_atom/topdown-fe-bound/,cpu_atom/topdown-be-bound/}" To simplify the implementation, on hybrid, --topdown is used together with --cputype. If without --cputype, it uses cpu_core topdown events by default. # ./perf stat --topdown -a sleep 1 WARNING: default to use cpu_core topdown events Performance counter stats for 'system wide': retiring bad speculation frontend bound backend bound heavy operations light operations branch mispredict machine clears fetch latency fetch bandwidth memory bound Core bound 4.1% 0.0% 5.1% 90.8% 2.3% 1.8% 0.0% 0.0% 4.2% 0.9% 9.9% 81.0% 1.002624229 seconds time elapsed # ./perf stat --topdown -a --cputype atom sleep 1 Performance counter stats for 'system wide': retiring bad speculation frontend bound backend bound 13.5% 0.1% 31.2% 55.2% 1.002366987 seconds time elapsed Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220422065635.767648-3-zhengjun.xing@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-22perf stat: Merge event counts from all hybrid PMUsZhengjun Xing1-0/+2
For hybrid events, by default stat aggregates and reports the event counts per pmu. # ./perf stat -e cycles -a sleep 1 Performance counter stats for 'system wide': 14,066,877,268 cpu_core/cycles/ 6,814,443,147 cpu_atom/cycles/ 1.002760625 seconds time elapsed Sometimes, it's also useful to aggregate event counts from all PMUs. Create a new option '--hybrid-merge' to enable that behavior and report the counts without PMUs. # ./perf stat -e cycles -a --hybrid-merge sleep 1 Performance counter stats for 'system wide': 20,732,982,512 cycles 1.002776793 seconds time elapsed Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220422065635.767648-2-zhengjun.xing@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-20perf stat: Add user_time and system_time eventsFlorian Fischer1-8/+28
It bothered me that during benchmarking using 'perf stat' (to collect for example CPU cache events) I could not simultaneously retrieve the times spend in user or kernel mode in a machine readable format. When running 'perf stat' the output for humans contains the times reported by rusage and wait4. $ perf stat -e cache-misses:u -- true Performance counter stats for 'true': 4,206 cache-misses:u 0.001113619 seconds time elapsed 0.001175000 seconds user 0.000000000 seconds sys But 'perf stat's machine-readable format does not provide this information. $ perf stat -x, -e cache-misses:u -- true 4282,,cache-misses:u,492859,100.00,, I found no way to retrieve this information using the available events while using machine-readable output. This patch adds two new tool internal events 'user_time' and 'system_time', similarly to the already present 'duration_time' event. Both events use the already collected rusage information obtained by wait4 and tracked in the global ru_stats. Examples presenting cache-misses and rusage information in both human and machine-readable form: $ perf stat -e duration_time,user_time,system_time,cache-misses -- grep -q -r duration_time . Performance counter stats for 'grep -q -r duration_time .': 67,422,542 ns duration_time:u 50,517,000 ns user_time:u 16,839,000 ns system_time:u 30,937 cache-misses:u 0.067422542 seconds time elapsed 0.050517000 seconds user 0.016839000 seconds sys $ perf stat -x, -e duration_time,user_time,system_time,cache-misses -- grep -q -r duration_time . 72134524,ns,duration_time:u,72134524,100.00,, 65225000,ns,user_time:u,65225000,100.00,, 6865000,ns,system_time:u,6865000,100.00,, 38705,,cache-misses:u,71189328,100.00,, Signed-off-by: Florian Fischer <florian.fischer@muhq.space> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20220420102354.468173-3-florian.fischer@muhq.space Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-20perf stat: Introduce stats for the user and system rusage timesFlorian Fischer1-1/+4
This is preparation for exporting rusage values as tool events. Add new global stats tracking the values obtained via rusage. For now only ru_utime and ru_stime are part of the tracked stats. Both are stored as nanoseconds to be consistent with 'duration_time', although the finest resolution the struct timeval data in rusage provides are microseconds. Signed-off-by: Florian Fischer <florian.fischer@muhq.space> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20220420102354.468173-2-florian.fischer@muhq.space Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-01perf evlist: Rename cpus to user_requested_cpusIan Rogers1-5/+5
evlist contains cpus and all_cpus. all_cpus is the union of the cpu maps of all evsels. For non-task targets, cpus is set to be cpus requested from the command line, defaulting to all online cpus if no cpus are specified. For an uncore event, all_cpus may be just CPU 0 or every online CPU. This causes all_cpus to have fewer values than the cpus variable which is confusing given the 'all' in the name. To try to make the behavior clearer, rename cpus to user_requested_cpus and add comments on the two struct variables. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Antonov <alexander.antonov@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: John Garry <john.garry@huawei.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20220328232648.2127340-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-01perf stat: Avoid SEGV if core.cpus isn't setIan Rogers1-1/+4
Passing NULL to perf_cpu_map__max doesn't make sense as there is no valid max. Avoid this problem by null checking in perf_stat_init_aggr_mode. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Antonov <alexander.antonov@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: John Garry <john.garry@huawei.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20220328062414.1893550-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-26perf tools: Enhance the matching of sub-commands abbreviationsWei Li1-2/+2
We support short command 'rec*' for 'record' and 'rep*' for 'report' in lots of sub-commands, but the matching is not quite strict currnetly. It may be puzzling sometime, like we mis-type a 'recport' to report but it will perform 'record' in fact without any message. To fix this, add a check to ensure that the short cmd is valid prefix of the real command. Committer testing: [root@quaco ~]# perf c2c re sleep 1 Usage: perf c2c {record|report} -v, --verbose be more verbose (show counter open errors, etc) # perf c2c rec sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.038 MB perf.data (16 samples) ] # perf c2c recport sleep 1 Usage: perf c2c {record|report} -v, --verbose be more verbose (show counter open errors, etc) # perf c2c record sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.038 MB perf.data (15 samples) ] # perf c2c records sleep 1 Usage: perf c2c {record|report} -v, --verbose be more verbose (show counter open errors, etc) # Signed-off-by: Wei Li <liwei391@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Rui Xiang <rui.xiang@huawei.com> Link: http://lore.kernel.org/lkml/20220325092032.2956161-1-liwei391@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-24perf stat: Fix forked applications enablement of countersThomas Richter1-1/+1
I have run into the following issue: # perf stat -a -e new_pmu/INSTRUCTION_7/ -- mytest -c1 7 Performance counter stats for 'system wide': 0 new_pmu/INSTRUCTION_7/ 0.000366428 seconds time elapsed # The new PMU for s390 counts the execution of certain CPU instructions. The root cause is the extremely small run time of the mytest program. It just executes some assembly instructions and then exits. In above invocation the instruction is executed exactly one time (-c1 option). The PMU is expected to report this one time execution by a counter value of one, but fails to do so in some cases, not all. Debugging reveals the invocation of the child process is done *before* the counter events are installed and enabled. Tracing reveals that sometimes the child process starts and exits before the event is installed on all CPUs. The more CPUs the machine has, the more often this miscount happens. Fix this by reversing the start of the work load after the events have been installed on the specified CPUs. Now the comment also matches the code. Output after: # perf stat -a -e new_pmu/INSTRUCTION_7/ -- mytest -c1 7 Performance counter stats for 'system wide': 1 new_pmu/INSTRUCTION_7/ 0.000366428 seconds time elapsed # Now the correct result is reported rock solid all the time regardless how many CPUs are online. Reviewers notes: Jiri: Right, without -a the event has enable_on_exec so the race does not matter, but it's a problem for system wide with fork. Namhyung: Agreed. Also we may move the enable_counters() and the clock code out of the if block to be shared with the else block. Fixes: acf2892270dcc428 ("perf stat: Use perf_evlist__prepare/start_workload()") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20220317155346.577384-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-22perf cpumap: Migrate to libperf cpumap apiIan Rogers1-3/+4
Switch from directly accessing the perf_cpu_map to using the appropriate libperf API when possible. Using the API simplifies the job of refactoring use of perf_cpu_map. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: André Almeida <andrealmeid@collabora.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miaoqian Lin <linmq006@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yury Norov <yury.norov@gmail.com> Link: http://lore.kernel.org/lkml/20220122045811.3402706-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-18perf stat: No need to setup affinities when starting a workloadArnaldo Carvalho de Melo1-7/+10
I.e. the simple: $ perf stat sleep 1 Uses a dummy CPU map and thus there is no need to setup/cleanup affinities to avoid IPIs, etc. With this we're down to a sched_getaffinity() call, in the libnuma initialization, that probably can be removed in a followup patch. Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20220117160931.1191712-3-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-12perf cpumap: Give CPUs their own typeIan Rogers1-33/+34
A common problem is confusing CPU map indices with the CPU, by wrapping the CPU with a struct then this is avoided. This approach is similar to atomic_t. Committer notes: To make it build with BUILD_BPF_SKEL=1 these files needed the conversions to 'struct perf_cpu' usage: tools/perf/util/bpf_counter.c tools/perf/util/bpf_counter_cgroup.c tools/perf/util/bpf_ftrace.c Also perf_env__get_cpu() was removed back in "perf cpumap: Switch cpu_map__build_map to cpu function". Additionally these needed to be fixed for the ARM builds to complete: tools/perf/arch/arm/util/cs-etm.c tools/perf/arch/arm64/util/pmu.c Suggested-by: John Garry <john.garry@huawei.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Vineet Singh <vineet.singh@intel.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: zhengjun.xing@intel.com Link: https://lore.kernel.org/r/20220105061351.120843-49-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-12perf stat: Correct variable name for read counterIan Rogers1-12/+12
Switch from cpu to cpu_map_idx to reduce confusion. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Vineet Singh <vineet.singh@intel.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: zhengjun.xing@intel.com Link: https://lore.kernel.org/r/20220105061351.120843-37-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-12perf evsel: Pass cpu not cpu map index to synthesizeIan Rogers1-2/+3
evsel__write_stat_event() was incorrectly passing a cpu map index rather than a CPU to perf_event__synthesize_stat(). Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Vineet Singh <vineet.singh@intel.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: zhengjun.xing@intel.com Link: https://lore.kernel.org/r/20220105061351.120843-36-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-12perf evlist: Refactor evlist__for_each_cpu()Ian Rogers1-93/+86
Previously evlist__for_each_cpu() needed to iterate over the evlist in an inner loop and call "skip" routines. Refactor this so that the iteratr is smarter and the next function can update both the current CPU and evsel. By using a cpu map index, fix apparent off-by-1 in __run_perf_stat's call to perf_evsel__close_cpu(). Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Vineet Singh <vineet.singh@intel.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: zhengjun.xing@intel.com Link: https://lore.kernel.org/r/20220105061351.120843-35-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-12perf cpumap: Rename cpu_map__get_X_aggr_by_cpu functionsIan Rogers1-9/+9
The functions don't use a cpu_map so reduce them to being like constructors of aggr_cpu_id. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Vineet Singh <vineet.singh@intel.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: zhengjun.xing@intel.com Link: https://lore.kernel.org/r/20220105061351.120843-20-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-12perf cpumap: Refactor cpu_map__build_map()Ian Rogers1-87/+100
Turn it into a cpu_aggr_map__new(). Pass helper functions. Refactor builtin-stat calls to manually pass function pointers. Try to reduce some copy-paste code. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Vineet Singh <vineet.singh@intel.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: zhengjun.xing@intel.com Link: https://lore.kernel.org/r/20220105061351.120843-19-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-12perf cpumap: Rename empty functionsIan Rogers1-6/+6
Remove cpu_map from name as a cpu_map isn't used. Pass a const pointer rather than by value to avoid unnecessary copying. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Vineet Singh <vineet.singh@intel.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: zhengjun.xing@intel.com Link: https://lore.kernel.org/r/20220105061351.120843-15-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>