From 3f5df3ac646e21a79a421ae4037c4ef0632bcaa9 Mon Sep 17 00:00:00 2001 From: Ian Rogers Date: Tue, 30 Aug 2022 09:48:40 -0700 Subject: perf metric: Return early if no CPU PMU table exists Previous behavior is to segfault if there is no CPU PMU table and a metric is sought. To reproduce compile with NO_JEVENTS=1 then request a metric, for example, "perf stat -M IPC true". Committer testing: Before: $ make -k NO_JEVENTS=1 BUILD_BPF_SKEL=1 O=/tmp/build/perf-urgent -C tools/perf install-bin $ perf stat -M IPC true Segmentation fault (core dumped) $ After: $ perf stat -M IPC true Usage: perf stat [] [] -M, --metrics monitor specified metrics or metric groups (separated by ,) $ Fixes: 00facc760903be66 ("perf jevents: Switch build to use jevents.py") Signed-off-by: Ian Rogers Tested-by: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Andi Kleen Cc: Caleb Biggers Cc: Florian Fischer Cc: Ian Rogers Cc: Ingo Molnar Cc: James Clark Cc: Jiri Olsa Cc: John Garry Cc: Kan Liang Cc: Kshipra Bopardikar Cc: Mark Rutland Cc: Miaoqian Lin Cc: Namhyung Kim Cc: Perry Taylor Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Thomas Richter Cc: Xing Zhengjun Link: https://lore.kernel.org/r/20220830164846.401143-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/metricgroup.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c index 464475fd6b9a..c93bcaf6d55d 100644 --- a/tools/perf/util/metricgroup.c +++ b/tools/perf/util/metricgroup.c @@ -1655,6 +1655,9 @@ int metricgroup__parse_groups(const struct option *opt, struct evlist *perf_evlist = *(struct evlist **)opt->value; const struct pmu_events_table *table = pmu_events_table__find(); + if (!table) + return -EINVAL; + return parse_groups(perf_evlist, str, metric_no_group, metric_no_merge, NULL, metric_events, table); } -- cgit v1.2.3 From 35503ce12a2c3d5d9a94e3cd85a06739b0120f79 Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Wed, 31 Aug 2022 14:40:41 +0200 Subject: perf script: Skip dummy event attr check Hongtao Yu reported problem when displaying uregs in perf script for system wide perf.data: # perf script -F uregs | head -10 Samples for 'dummy:HG' event do not have UREGS attribute set. Cannot print 'uregs' field. The problem is the extra dummy event added for system wide, which does not have proper sample_type setup. Skipping attr check completely for dummy event as suggested by Namhyung, because it does not have any samples anyway. Reported-by: Hongtao Yu Suggested-by: Namhyung Kim Signed-off-by: Jiri Olsa Acked-by: Ian Rogers Cc: Alexander Shishkin Cc: Ingo Molnar Cc: Mark Rutland Cc: Peter Zijlstra Link: https://lore.kernel.org/r/20220831124041.219925-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/builtin-script.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index 13580a9c50b8..304d234d8e84 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -566,6 +566,8 @@ static struct evsel *find_first_output_type(struct evlist *evlist, struct evsel *evsel; evlist__for_each_entry(evlist, evsel) { + if (evsel__is_dummy_event(evsel)) + continue; if (output_type(evsel->core.attr.type) == (int)type) return evsel; } -- cgit v1.2.3 From f0c86a2bae4fd12bfa8bad4d43fb59fb498cdd14 Mon Sep 17 00:00:00 2001 From: Zhengjun Xing Date: Fri, 26 Aug 2022 22:00:57 +0800 Subject: perf stat: Fix L2 Topdown metrics disappear for raw events MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit In perf/Documentation/perf-stat.txt, for "--td-level" the default "0" means the max level that the current hardware support. So we need initialize the stat_config.topdown_level to TOPDOWN_MAX_LEVEL when “--td-level=0” or no “--td-level” option. Otherwise, for the hardware with a max level is 2, the 2nd level metrics disappear for raw events in this case. The issue cannot be observed for the perf stat default or "--topdown" options. This commit fixes the raw events issue and removes the duplicated code for the perf stat default. Before: # ./perf stat -e "cpu-clock,context-switches,cpu-migrations,page-faults,instructions,cycles,ref-cycles,branches,branch-misses,{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound,topdown-heavy-ops,topdown-br-mispredict,topdown-fetch-lat,topdown-mem-bound}" sleep 1 Performance counter stats for 'sleep 1': 1.03 msec cpu-clock # 0.001 CPUs utilized 1 context-switches # 966.216 /sec 0 cpu-migrations # 0.000 /sec 60 page-faults # 57.973 K/sec 1,132,112 instructions # 1.41 insn per cycle 803,872 cycles # 0.777 GHz 1,909,120 ref-cycles # 1.845 G/sec 236,634 branches # 228.640 M/sec 6,367 branch-misses # 2.69% of all branches 4,823,232 slots # 4.660 G/sec 1,210,536 topdown-retiring # 25.1% Retiring 699,841 topdown-bad-spec # 14.5% Bad Speculation 1,777,975 topdown-fe-bound # 36.9% Frontend Bound 1,134,878 topdown-be-bound # 23.5% Backend Bound 189,146 topdown-heavy-ops # 182.756 M/sec 662,012 topdown-br-mispredict # 639.647 M/sec 1,097,048 topdown-fetch-lat # 1.060 G/sec 416,121 topdown-mem-bound # 402.063 M/sec 1.002423690 seconds time elapsed 0.002494000 seconds user 0.000000000 seconds sys After: # ./perf stat -e "cpu-clock,context-switches,cpu-migrations,page-faults,instructions,cycles,ref-cycles,branches,branch-misses,{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound,topdown-heavy-ops,topdown-br-mispredict,topdown-fetch-lat,topdown-mem-bound}" sleep 1 Performance counter stats for 'sleep 1': 1.13 msec cpu-clock # 0.001 CPUs utilized 1 context-switches # 882.128 /sec 0 cpu-migrations # 0.000 /sec 61 page-faults # 53.810 K/sec 1,137,612 instructions # 1.29 insn per cycle 881,477 cycles # 0.778 GHz 2,093,496 ref-cycles # 1.847 G/sec 236,356 branches # 208.496 M/sec 7,090 branch-misses # 3.00% of all branches 5,288,862 slots # 4.665 G/sec 1,223,697 topdown-retiring # 23.1% Retiring 767,403 topdown-bad-spec # 14.5% Bad Speculation 2,053,322 topdown-fe-bound # 38.8% Frontend Bound 1,244,438 topdown-be-bound # 23.5% Backend Bound 186,665 topdown-heavy-ops # 3.5% Heavy Operations # 19.6% Light Operations 725,922 topdown-br-mispredict # 13.7% Branch Mispredict # 0.8% Machine Clears 1,327,400 topdown-fetch-lat # 25.1% Fetch Latency # 13.7% Fetch Bandwidth 497,775 topdown-mem-bound # 9.4% Memory Bound # 14.1% Core Bound 1.002701530 seconds time elapsed 0.002744000 seconds user 0.000000000 seconds sys Fixes: 63e39aa6ae103451 ("perf stat: Support L2 Topdown events") Reviewed-by: Kan Liang Signed-off-by: Xing Zhengjun Cc: Alexander Shishkin Cc: Andi Kleen Cc: Ian Rogers Cc: Ingo Molnar Cc: Jiri Olsa Cc: Kan Liang Cc: Namhyung Kim Cc: Peter Zijlstra Link: https://lore.kernel.org/r/20220826140057.3289401-1-zhengjun.xing@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/builtin-stat.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 54cd29d07ca8..0b4a62e4ff67 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -1932,6 +1932,9 @@ setup_metrics: free(str); } + if (!stat_config.topdown_level) + stat_config.topdown_level = TOPDOWN_MAX_LEVEL; + if (!evsel_list->core.nr_entries) { if (target__has_cpu(&target)) default_attrs0[0].config = PERF_COUNT_SW_CPU_CLOCK; @@ -1948,8 +1951,6 @@ setup_metrics: } if (evlist__add_default_attrs(evsel_list, default_attrs1) < 0) return -1; - - stat_config.topdown_level = TOPDOWN_MAX_LEVEL; /* Platform specific attrs */ if (evlist__add_default_attrs(evsel_list, default_null_attrs) < 0) return -1; -- cgit v1.2.3 From 72cd652b73dd77df6f26bd450e804ee29232669f Mon Sep 17 00:00:00 2001 From: Athira Rajeev Date: Mon, 5 Sep 2022 19:49:28 +0530 Subject: perf affinity: Fix out of bound access to "sched_cpus" mask The affinity code in "affinity_set" function access array named "sched_cpus". The size for this array is allocated in affinity_setup function which is nothing but value from get_cpu_set_size. This is used to contain the cpumask value for each cpu. While setting bit for each cpu, it calls "set_bit" function which access index in sched_cpus array. If we provide a command-line option to -C which is more than the number of CPU's present in the system, the set_bit could access an array member which is out-of the array size. This is because currently, there is no boundary check for the CPU. This will result in seg fault: <<>> ./perf stat -C 12323431 ls Perf can support 2048 CPUs. Consider raising MAX_NR_CPUS Segmentation fault (core dumped) <<>> Fix this by adding boundary check for the array. After the fix from powerpc system: <<>> ./perf stat -C 12323431 ls 1>out Perf can support 2048 CPUs. Consider raising MAX_NR_CPUS Performance counter stats for 'CPU(s) 12323431': msec cpu-clock context-switches cpu-migrations page-faults cycles instructions branches branch-misses 0.001192373 seconds time elapsed <<>> Reported-by: Nageswara R Sastry Signed-off-by: Athira Jajeev Tested-by: Arnaldo Carvalho de Melo Tested-by: Nageswara R Sastry Cc: Jiri Olsa Cc: Kajol Jain Cc: Madhavan Srinivasan Cc: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20220905141929.7171-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/affinity.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/affinity.c b/tools/perf/util/affinity.c index 4d216c0dc425..4ee96b3c755b 100644 --- a/tools/perf/util/affinity.c +++ b/tools/perf/util/affinity.c @@ -49,8 +49,14 @@ void affinity__set(struct affinity *a, int cpu) { int cpu_set_size = get_cpu_set_size(); - if (cpu == -1) + /* + * Return: + * - if cpu is -1 + * - restrict out of bound access to sched_cpus + */ + if (cpu == -1 || ((cpu >= (cpu_set_size * 8)))) return; + a->changed = true; set_bit(cpu, a->sched_cpus); /* -- cgit v1.2.3 From cbd7bfc7fd99acdde58ec2b0bce990158fba1654 Mon Sep 17 00:00:00 2001 From: Athira Rajeev Date: Mon, 5 Sep 2022 19:49:29 +0530 Subject: tools/perf: Fix out of bound access to cpu mask array The cpu mask init code in "record__mmap_cpu_mask_init" function access "bits" array part of "struct mmap_cpu_mask". The size of this array is the value from cpu__max_cpu().cpu. This array is used to contain the cpumask value for each cpu. While setting bit for each cpu, it calls "set_bit" function which access index in "bits" array. If we provide a command line option to -C which is greater than the number of CPU's present in the system, the set_bit could access an array member which is out-of the array size. This is because currently, there is no boundary check for the CPU. This will result in seg fault: <<>> ./perf record -C 12341234 ls Perf can support 2048 CPUs. Consider raising MAX_NR_CPUS Segmentation fault (core dumped) <<>> Debugging with gdb, points to function flow as below: <<>> set_bit record__mmap_cpu_mask_init record__init_thread_default_masks record__init_thread_masks cmd_record <<>> Fix this by adding boundary check for the array. After the patch: <<>> ./perf record -C 12341234 ls Perf can support 2048 CPUs. Consider raising MAX_NR_CPUS Failed to initialize parallel data streaming masks <<>> With this fix, if -C is given a non-exsiting CPU, perf record will fail with: <<>> ./perf record -C 50 ls Failed to initialize parallel data streaming masks <<>> Reported-by: Nageswara R Sastry Signed-off-by: Athira Jajeev Tested-by: Arnaldo Carvalho de Melo Tested-by: Nageswara R Sastry Cc: Jiri Olsa Cc: Kajol Jain Cc: Madhavan Srinivasan Cc: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20220905141929.7171-2-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/builtin-record.c | 26 ++++++++++++++++++++------ 1 file changed, 20 insertions(+), 6 deletions(-) diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index 4713f0f3a6cf..09b68d76bbdc 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -3358,16 +3358,22 @@ static struct option __record_options[] = { struct option *record_options = __record_options; -static void record__mmap_cpu_mask_init(struct mmap_cpu_mask *mask, struct perf_cpu_map *cpus) +static int record__mmap_cpu_mask_init(struct mmap_cpu_mask *mask, struct perf_cpu_map *cpus) { struct perf_cpu cpu; int idx; if (cpu_map__is_dummy(cpus)) - return; + return 0; - perf_cpu_map__for_each_cpu(cpu, idx, cpus) + perf_cpu_map__for_each_cpu(cpu, idx, cpus) { + /* Return ENODEV is input cpu is greater than max cpu */ + if ((unsigned long)cpu.cpu > mask->nbits) + return -ENODEV; set_bit(cpu.cpu, mask->bits); + } + + return 0; } static int record__mmap_cpu_mask_init_spec(struct mmap_cpu_mask *mask, const char *mask_spec) @@ -3379,7 +3385,9 @@ static int record__mmap_cpu_mask_init_spec(struct mmap_cpu_mask *mask, const cha return -ENOMEM; bitmap_zero(mask->bits, mask->nbits); - record__mmap_cpu_mask_init(mask, cpus); + if (record__mmap_cpu_mask_init(mask, cpus)) + return -ENODEV; + perf_cpu_map__put(cpus); return 0; @@ -3461,7 +3469,12 @@ static int record__init_thread_masks_spec(struct record *rec, struct perf_cpu_ma pr_err("Failed to allocate CPUs mask\n"); return ret; } - record__mmap_cpu_mask_init(&cpus_mask, cpus); + + ret = record__mmap_cpu_mask_init(&cpus_mask, cpus); + if (ret) { + pr_err("Failed to init cpu mask\n"); + goto out_free_cpu_mask; + } ret = record__thread_mask_alloc(&full_mask, cpu__max_cpu().cpu); if (ret) { @@ -3702,7 +3715,8 @@ static int record__init_thread_default_masks(struct record *rec, struct perf_cpu if (ret) return ret; - record__mmap_cpu_mask_init(&rec->thread_masks->maps, cpus); + if (record__mmap_cpu_mask_init(&rec->thread_masks->maps, cpus)) + return -ENODEV; rec->nr_threads = 1; -- cgit v1.2.3 From 6ea9da51a5c00c9d9309c4c9aa21cbe63c799c56 Mon Sep 17 00:00:00 2001 From: Zixuan Tan Date: Fri, 26 Aug 2022 01:00:58 +0800 Subject: perf genelf: Switch deprecated openssl MD5_* functions to new EVP API Switch to the flavored EVP API like in test-libcrypto.c, and remove the bad gcc #pragma. Inspired-by: 5b245985a6de5ac1 ("tools build: Switch to new openssl API for test-libcrypto") Signed-off-by: Zixuan Tan Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Mark Rutland Cc: Namhyung Kim Cc: Peter Zijlstra Link: http://lore.kernel.org/lkml/CABwm_eTnARC1GwMD-JF176k8WXU1Z0+H190mvXn61yr369qt6g@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/genelf.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/tools/perf/util/genelf.c b/tools/perf/util/genelf.c index 953338b9e887..ed28a0dbcb7f 100644 --- a/tools/perf/util/genelf.c +++ b/tools/perf/util/genelf.c @@ -30,10 +30,6 @@ #define BUILD_ID_URANDOM /* different uuid for each run */ -// FIXME, remove this and fix the deprecation warnings before its removed and -// We'll break for good here... -#pragma GCC diagnostic ignored "-Wdeprecated-declarations" - #ifdef HAVE_LIBCRYPTO_SUPPORT #define BUILD_ID_MD5 @@ -45,6 +41,7 @@ #endif #ifdef BUILD_ID_MD5 +#include #include #endif #endif @@ -142,15 +139,20 @@ gen_build_id(struct buildid_note *note, static void gen_build_id(struct buildid_note *note, unsigned long load_addr, const void *code, size_t csize) { - MD5_CTX context; + EVP_MD_CTX *mdctx; if (sizeof(note->build_id) < 16) errx(1, "build_id too small for MD5"); - MD5_Init(&context); - MD5_Update(&context, &load_addr, sizeof(load_addr)); - MD5_Update(&context, code, csize); - MD5_Final((unsigned char *)note->build_id, &context); + mdctx = EVP_MD_CTX_new(); + if (!mdctx) + errx(2, "failed to create EVP_MD_CTX"); + + EVP_DigestInit_ex(mdctx, EVP_md5(), NULL); + EVP_DigestUpdate(mdctx, &load_addr, sizeof(load_addr)); + EVP_DigestUpdate(mdctx, code, csize); + EVP_DigestFinal_ex(mdctx, (unsigned char *)note->build_id, NULL); + EVP_MD_CTX_free(mdctx); } #endif -- cgit v1.2.3 From 4efa8e314351cb3e7f229ddd3571069edf38f99a Mon Sep 17 00:00:00 2001 From: Shang XiaoJing Date: Tue, 6 Sep 2022 11:29:06 +0800 Subject: perf c2c: Prevent potential memory leak in c2c_he_zalloc() Free allocated resources when zalloc() fails for members in c2c_he, to prevent potential memory leak in c2c_he_zalloc(). Signed-off-by: Shang XiaoJing Reviewed-by: Leo Yan Acked-by: Jiri Olsa Cc: Alexander Shishkin Cc: Mark Rutland Cc: Namhyung Kim Cc: Peter Zijlstra Link: http://lore.kernel.org/lkml/20220906032906.21395-4-shangxiaojing@huawei.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/builtin-c2c.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c index 653e13b5037e..438fc222e213 100644 --- a/tools/perf/builtin-c2c.c +++ b/tools/perf/builtin-c2c.c @@ -146,15 +146,15 @@ static void *c2c_he_zalloc(size_t size) c2c_he->cpuset = bitmap_zalloc(c2c.cpus_cnt); if (!c2c_he->cpuset) - return NULL; + goto out_free; c2c_he->nodeset = bitmap_zalloc(c2c.nodes_cnt); if (!c2c_he->nodeset) - return NULL; + goto out_free; c2c_he->node_stats = zalloc(c2c.nodes_cnt * sizeof(*c2c_he->node_stats)); if (!c2c_he->node_stats) - return NULL; + goto out_free; init_stats(&c2c_he->cstats.lcl_hitm); init_stats(&c2c_he->cstats.rmt_hitm); @@ -163,6 +163,12 @@ static void *c2c_he_zalloc(size_t size) init_stats(&c2c_he->cstats.load); return &c2c_he->he; + +out_free: + free(c2c_he->nodeset); + free(c2c_he->cpuset); + free(c2c_he); + return NULL; } static void c2c_he_free(void *he) -- cgit v1.2.3 From 7864d8f7c088aad988c44c631f1ceed9179cf2cf Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Mon, 5 Sep 2022 14:42:09 +0300 Subject: libperf evlist: Fix per-thread mmaps for multi-threaded targets MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The offending commit removed mmap_per_thread(), which did not consider the different set-output rules for per-thread mmaps i.e. in the per-thread case set-output is used for file descriptors of the same thread not the same cpu. This was not immediately noticed because it only happens with multi-threaded targets and we do not have a test for that yet. Reinstate mmap_per_thread() expanding it to cover also system-wide per-cpu events i.e. to continue to allow the mixing of per-thread and per-cpu mmaps. Debug messages (with -vv) show the file descriptors that are opened with sys_perf_event_open. New debug messages are added (needs -vvv) that show also which file descriptors are mmapped and which are redirected with set-output. In the per-cpu case (cpu != -1) file descriptors for the same CPU are set-output to the first file descriptor for that CPU. In the per-thread case (cpu == -1) file descriptors for the same thread are set-output to the first file descriptor for that thread. Example (process 17489 has 2 threads): Before (but with new debug prints): $ perf record --no-bpf-event -vvv --per-thread -p 17489 sys_perf_event_open: pid 17489 cpu -1 group_fd -1 flags 0x8 = 5 sys_perf_event_open: pid 17490 cpu -1 group_fd -1 flags 0x8 = 6 libperf: idx 0: mmapping fd 5 libperf: idx 0: set output fd 6 -> 5 failed to mmap with 22 (Invalid argument) After: $ perf record --no-bpf-event -vvv --per-thread -p 17489 sys_perf_event_open: pid 17489 cpu -1 group_fd -1 flags 0x8 = 5 sys_perf_event_open: pid 17490 cpu -1 group_fd -1 flags 0x8 = 6 libperf: mmap_per_thread: nr cpu values (may include -1) 1 nr threads 2 libperf: idx 0: mmapping fd 5 libperf: idx 1: mmapping fd 6 [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.018 MB perf.data (15 samples) ] Per-cpu example (process 20341 has 2 threads, same as above): $ perf record --no-bpf-event -vvv -p 20341 sys_perf_event_open: pid 20341 cpu 0 group_fd -1 flags 0x8 = 5 sys_perf_event_open: pid 20342 cpu 0 group_fd -1 flags 0x8 = 6 sys_perf_event_open: pid 20341 cpu 1 group_fd -1 flags 0x8 = 7 sys_perf_event_open: pid 20342 cpu 1 group_fd -1 flags 0x8 = 8 sys_perf_event_open: pid 20341 cpu 2 group_fd -1 flags 0x8 = 9 sys_perf_event_open: pid 20342 cpu 2 group_fd -1 flags 0x8 = 10 sys_perf_event_open: pid 20341 cpu 3 group_fd -1 flags 0x8 = 11 sys_perf_event_open: pid 20342 cpu 3 group_fd -1 flags 0x8 = 12 sys_perf_event_open: pid 20341 cpu 4 group_fd -1 flags 0x8 = 13 sys_perf_event_open: pid 20342 cpu 4 group_fd -1 flags 0x8 = 14 sys_perf_event_open: pid 20341 cpu 5 group_fd -1 flags 0x8 = 15 sys_perf_event_open: pid 20342 cpu 5 group_fd -1 flags 0x8 = 16 sys_perf_event_open: pid 20341 cpu 6 group_fd -1 flags 0x8 = 17 sys_perf_event_open: pid 20342 cpu 6 group_fd -1 flags 0x8 = 18 sys_perf_event_open: pid 20341 cpu 7 group_fd -1 flags 0x8 = 19 sys_perf_event_open: pid 20342 cpu 7 group_fd -1 flags 0x8 = 20 libperf: mmap_per_cpu: nr cpu values 8 nr threads 2 libperf: idx 0: mmapping fd 5 libperf: idx 0: set output fd 6 -> 5 libperf: idx 1: mmapping fd 7 libperf: idx 1: set output fd 8 -> 7 libperf: idx 2: mmapping fd 9 libperf: idx 2: set output fd 10 -> 9 libperf: idx 3: mmapping fd 11 libperf: idx 3: set output fd 12 -> 11 libperf: idx 4: mmapping fd 13 libperf: idx 4: set output fd 14 -> 13 libperf: idx 5: mmapping fd 15 libperf: idx 5: set output fd 16 -> 15 libperf: idx 6: mmapping fd 17 libperf: idx 6: set output fd 18 -> 17 libperf: idx 7: mmapping fd 19 libperf: idx 7: set output fd 20 -> 19 [ perf record: Woken up 7 times to write data ] [ perf record: Captured and wrote 0.020 MB perf.data (17 samples) ] Fixes: ae4f8ae16a078964 ("libperf evlist: Allow mixing per-thread and per-cpu mmaps") Reported-by: Tomáš Trnka Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216441 Signed-off-by: Adrian Hunter Acked-by: Jiri Olsa Cc: Adrian Hunter Cc: Ian Rogers Cc: Ingo Molnar Cc: Namhyung Kim Cc: Peter Zijlstra Link: https://lore.kernel.org/r/20220905114209.8389-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/lib/perf/evlist.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+) diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c index e6c98a6e3908..6b1bafe267a4 100644 --- a/tools/lib/perf/evlist.c +++ b/tools/lib/perf/evlist.c @@ -486,6 +486,7 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops, if (ops->idx) ops->idx(evlist, evsel, mp, idx); + pr_debug("idx %d: mmapping fd %d\n", idx, *output); if (ops->mmap(map, mp, *output, evlist_cpu) < 0) return -1; @@ -494,6 +495,7 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops, if (!idx) perf_evlist__set_mmap_first(evlist, map, overwrite); } else { + pr_debug("idx %d: set output fd %d -> %d\n", idx, fd, *output); if (ioctl(fd, PERF_EVENT_IOC_SET_OUTPUT, *output) != 0) return -1; @@ -519,6 +521,48 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops, return 0; } +static int +mmap_per_thread(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops, + struct perf_mmap_param *mp) +{ + int nr_threads = perf_thread_map__nr(evlist->threads); + int nr_cpus = perf_cpu_map__nr(evlist->all_cpus); + int cpu, thread, idx = 0; + int nr_mmaps = 0; + + pr_debug("%s: nr cpu values (may include -1) %d nr threads %d\n", + __func__, nr_cpus, nr_threads); + + /* per-thread mmaps */ + for (thread = 0; thread < nr_threads; thread++, idx++) { + int output = -1; + int output_overwrite = -1; + + if (mmap_per_evsel(evlist, ops, idx, mp, 0, thread, &output, + &output_overwrite, &nr_mmaps)) + goto out_unmap; + } + + /* system-wide mmaps i.e. per-cpu */ + for (cpu = 1; cpu < nr_cpus; cpu++, idx++) { + int output = -1; + int output_overwrite = -1; + + if (mmap_per_evsel(evlist, ops, idx, mp, cpu, 0, &output, + &output_overwrite, &nr_mmaps)) + goto out_unmap; + } + + if (nr_mmaps != evlist->nr_mmaps) + pr_err("Miscounted nr_mmaps %d vs %d\n", nr_mmaps, evlist->nr_mmaps); + + return 0; + +out_unmap: + perf_evlist__munmap(evlist); + return -1; +} + static int mmap_per_cpu(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops, struct perf_mmap_param *mp) @@ -528,6 +572,8 @@ mmap_per_cpu(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops, int nr_mmaps = 0; int cpu, thread; + pr_debug("%s: nr cpu values %d nr threads %d\n", __func__, nr_cpus, nr_threads); + for (cpu = 0; cpu < nr_cpus; cpu++) { int output = -1; int output_overwrite = -1; @@ -569,6 +615,7 @@ int perf_evlist__mmap_ops(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops, struct perf_mmap_param *mp) { + const struct perf_cpu_map *cpus = evlist->all_cpus; struct perf_evsel *evsel; if (!ops || !ops->get || !ops->mmap) @@ -588,6 +635,9 @@ int perf_evlist__mmap_ops(struct perf_evlist *evlist, if (evlist->pollfd.entries == NULL && perf_evlist__alloc_pollfd(evlist) < 0) return -ENOMEM; + if (perf_cpu_map__empty(cpus)) + return mmap_per_thread(evlist, ops, mp); + return mmap_per_cpu(evlist, ops, mp); } -- cgit v1.2.3 From 1706623e940347ad23fdf77910eca4905dc37f91 Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Mon, 5 Sep 2022 10:47:35 +0300 Subject: perf dlfilter dlfilter-show-cycles: Fix types for print format Avoid compiler warning about format %llu that expects long long unsigned int but argument has type __u64. Reported-by: Arnaldo Carvalho de Melo Fixes: c3afd6e50fce824f ("perf dlfilter: Add dlfilter-show-cycles") Signed-off-by: Adrian Hunter Cc: Adrian Hunter Cc: Ian Rogers Cc: Jiri Olsa Cc: Namhyung Kim Link: https://lore.kernel.org/r/20220905074735.4513-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/dlfilters/dlfilter-show-cycles.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/perf/dlfilters/dlfilter-show-cycles.c b/tools/perf/dlfilters/dlfilter-show-cycles.c index 9eccc97bff82..6d47298ebe9f 100644 --- a/tools/perf/dlfilters/dlfilter-show-cycles.c +++ b/tools/perf/dlfilters/dlfilter-show-cycles.c @@ -98,9 +98,9 @@ int filter_event_early(void *data, const struct perf_dlfilter_sample *sample, vo static void print_vals(__u64 cycles, __u64 delta) { if (delta) - printf("%10llu %10llu ", cycles, delta); + printf("%10llu %10llu ", (unsigned long long)cycles, (unsigned long long)delta); else - printf("%10llu %10s ", cycles, ""); + printf("%10llu %10s ", (unsigned long long)cycles, ""); } int filter_event(void *data, const struct perf_dlfilter_sample *sample, void *ctx) -- cgit v1.2.3 From 3705a6ef4068cbc9c2344561345d09630fd278ec Mon Sep 17 00:00:00 2001 From: Yang Jihong Date: Thu, 8 Sep 2022 09:48:54 +0800 Subject: perf lock: Remove redundant word 'contention' in help message Before: # perf lock -h Usage: perf lock [] {record|report|script|info|contention|contention} -D, --dump-raw-trace dump raw trace in ASCII -f, --force don't complain, do it -i, --input input file name -v, --verbose be more verbose (show symbol address, etc) --kallsyms kallsyms pathname --vmlinux vmlinux pathname After: # perf lock -h Usage: perf lock [] {record|report|script|info|contention} -D, --dump-raw-trace dump raw trace in ASCII -f, --force don't complain, do it -i, --input input file name -v, --verbose be more verbose (show symbol address, etc) --kallsyms kallsyms pathname --vmlinux vmlinux pathname Fixes: 528b9cab3b813a3b ("perf lock: Add 'contention' subcommand") Signed-off-by: Yang Jihong Acked-by: Namhyung Kim Cc: Alexander Shishkin Cc: Ingo Molnar Cc: Jiri Olsa Cc: Mark Rutland Cc: Namhyung Kim Cc: Peter Zijlstra Link: https://lore.kernel.org/r/20220908014854.151203-1-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/builtin-lock.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c index dd11d3471baf..ea40ae52cd2c 100644 --- a/tools/perf/builtin-lock.c +++ b/tools/perf/builtin-lock.c @@ -1874,8 +1874,7 @@ int cmd_lock(int argc, const char **argv) NULL }; const char *const lock_subcommands[] = { "record", "report", "script", - "info", "contention", - "contention", NULL }; + "info", "contention", NULL }; const char *lock_usage[] = { NULL, NULL -- cgit v1.2.3 From 82b2425fad2dd47204b3da589b679220f8aacc0e Mon Sep 17 00:00:00 2001 From: Zhengjun Xing Date: Thu, 8 Sep 2022 15:00:30 +0800 Subject: perf script: Fix Cannot print 'iregs' field for hybrid systems Commit b91e5492f9d7ca89 ("perf record: Add a dummy event on hybrid systems to collect metadata records") adds a dummy event on hybrid systems to fix the symbol "unknown" issue when the workload is created in a P-core but runs on an E-core. The added dummy event will cause "perf script -F iregs" to fail. Dummy events do not have "iregs" attribute set, so when we do evsel__check_attr, the "iregs" attribute check will fail, so the issue happened. The following commit [1] has fixed a similar issue by skipping the attr check for the dummy event because it does not have any samples anyway. It works okay for the normal mode, but the issue still happened when running the test in the pipe mode. In the pipe mode, it calls process_attr() which still checks the attr for the dummy event. This commit fixed the issue by skipping the attr check for the dummy event in the API evsel__check_attr, Otherwise, we have to patch everywhere when evsel__check_attr() is called. Before: #./perf record -o - --intr-regs=di,r8,dx,cx -e br_inst_retired.near_call:p -c 1000 --per-thread true 2>/dev/null|./perf script -F iregs |head -5 Samples for 'dummy:HG' event do not have IREGS attribute set. Cannot print 'iregs' field. 0x120 [0x90]: failed to process type: 64 # After: # ./perf record -o - --intr-regs=di,r8,dx,cx -e br_inst_retired.near_call:p -c 1000 --per-thread true 2>/dev/null|./perf script -F iregs |head -5 ABI:2 CX:0x55b8efa87000 DX:0x55b8efa7e000 DI:0xffffba5e625efbb0 R8:0xffff90e51f8ae100 ABI:2 CX:0x7f1dae1e4000 DX:0xd0 DI:0xffff90e18c675ac0 R8:0x71 ABI:2 CX:0xcc0 DX:0x1 DI:0xffff90e199880240 R8:0x0 ABI:2 CX:0xffff90e180dd7500 DX:0xffff90e180dd7500 DI:0xffff90e180043500 R8:0x1 ABI:2 CX:0x50 DX:0xffff90e18c583bd0 DI:0xffff90e1998803c0 R8:0x58 # [1]https://lore.kernel.org/lkml/20220831124041.219925-1-jolsa@kernel.org/ Fixes: b91e5492f9d7ca89 ("perf record: Add a dummy event on hybrid systems to collect metadata records") Suggested-by: Namhyung Kim Signed-off-by: Xing Zhengjun Acked-by: Jiri Olsa Cc: Alexander Shishkin Cc: Andi Kleen Cc: Ian Rogers Cc: Ingo Molnar Cc: Kan Liang Cc: Peter Zijlstra Link: https://lore.kernel.org/r/20220908070030.3455164-1-zhengjun.xing@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/builtin-script.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index 304d234d8e84..029b4330e59b 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -445,6 +445,9 @@ static int evsel__check_attr(struct evsel *evsel, struct perf_session *session) struct perf_event_attr *attr = &evsel->core.attr; bool allow_user_set; + if (evsel__is_dummy_event(evsel)) + return 0; + if (perf_header__has_feat(&session->header, HEADER_STAT)) return 0; -- cgit v1.2.3 From 0a9eaf616f29ca32068d2d8fe04eeef67505720d Mon Sep 17 00:00:00 2001 From: Jiri Slaby Date: Thu, 8 Sep 2022 08:04:26 +0200 Subject: perf tools: Don't install data files with x permissions install(1), by default, installs with rwxr-xr-x permissions. Modify perf's Makefile to pass '-m 644' when installing: * Documentation/tips.txt * examples/bpf/* * perf-completion.sh * perf_dlfilter.h header * scripts/perl/Perf-Trace-Util/lib/Perf/Trace/* * scripts/perl/*.pl * tests/attr/* * tests/attr.py * tests/shell/lib/*.sh * trace/strace/groups/* All those are supposed to be non-executable. Either they are not scripts at all, or they don't have shebang. Signed-off-by: Cc: Adrian Hunter Cc: Andi Kleen Cc: Ingo Molnar Cc: Jiri Olsa Cc: Kan Liang Cc: Leo Yan Cc: Mark Rutland Cc: Namhyung Kim Cc: Peter Zijlstra Link: https://lore.kernel.org/r/20220908060426.9619-1-jslaby@suse.cz Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/Makefile.perf | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf index e5921b347153..bd947885a639 100644 --- a/tools/perf/Makefile.perf +++ b/tools/perf/Makefile.perf @@ -954,11 +954,11 @@ ifndef NO_LIBBPF $(call QUIET_INSTALL, bpf-headers) \ $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perf_include_instdir_SQ)/bpf'; \ $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perf_include_instdir_SQ)/bpf/linux'; \ - $(INSTALL) include/bpf/*.h -t '$(DESTDIR_SQ)$(perf_include_instdir_SQ)/bpf'; \ - $(INSTALL) include/bpf/linux/*.h -t '$(DESTDIR_SQ)$(perf_include_instdir_SQ)/bpf/linux' + $(INSTALL) include/bpf/*.h -m 644 -t '$(DESTDIR_SQ)$(perf_include_instdir_SQ)/bpf'; \ + $(INSTALL) include/bpf/linux/*.h -m 644 -t '$(DESTDIR_SQ)$(perf_include_instdir_SQ)/bpf/linux' $(call QUIET_INSTALL, bpf-examples) \ $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perf_examples_instdir_SQ)/bpf'; \ - $(INSTALL) examples/bpf/*.c -t '$(DESTDIR_SQ)$(perf_examples_instdir_SQ)/bpf' + $(INSTALL) examples/bpf/*.c -m 644 -t '$(DESTDIR_SQ)$(perf_examples_instdir_SQ)/bpf' endif $(call QUIET_INSTALL, perf-archive) \ $(INSTALL) $(OUTPUT)perf-archive -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)' @@ -967,13 +967,13 @@ endif ifndef NO_LIBAUDIT $(call QUIET_INSTALL, strace/groups) \ $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(STRACE_GROUPS_INSTDIR_SQ)'; \ - $(INSTALL) trace/strace/groups/* -t '$(DESTDIR_SQ)$(STRACE_GROUPS_INSTDIR_SQ)' + $(INSTALL) trace/strace/groups/* -m 644 -t '$(DESTDIR_SQ)$(STRACE_GROUPS_INSTDIR_SQ)' endif ifndef NO_LIBPERL $(call QUIET_INSTALL, perl-scripts) \ $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/perl/Perf-Trace-Util/lib/Perf/Trace'; \ - $(INSTALL) scripts/perl/Perf-Trace-Util/lib/Perf/Trace/* -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/perl/Perf-Trace-Util/lib/Perf/Trace'; \ - $(INSTALL) scripts/perl/*.pl -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/perl'; \ + $(INSTALL) scripts/perl/Perf-Trace-Util/lib/Perf/Trace/* -m 644 -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/perl/Perf-Trace-Util/lib/Perf/Trace'; \ + $(INSTALL) scripts/perl/*.pl -m 644 -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/perl'; \ $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/perl/bin'; \ $(INSTALL) scripts/perl/bin/* -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/perl/bin' endif @@ -990,23 +990,23 @@ endif $(INSTALL) $(DLFILTERS) '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/dlfilters'; $(call QUIET_INSTALL, perf_completion-script) \ $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(sysconfdir_SQ)/bash_completion.d'; \ - $(INSTALL) perf-completion.sh '$(DESTDIR_SQ)$(sysconfdir_SQ)/bash_completion.d/perf' + $(INSTALL) perf-completion.sh -m 644 '$(DESTDIR_SQ)$(sysconfdir_SQ)/bash_completion.d/perf' $(call QUIET_INSTALL, perf-tip) \ $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(tip_instdir_SQ)'; \ - $(INSTALL) Documentation/tips.txt -t '$(DESTDIR_SQ)$(tip_instdir_SQ)' + $(INSTALL) Documentation/tips.txt -m 644 -t '$(DESTDIR_SQ)$(tip_instdir_SQ)' install-tests: all install-gtk $(call QUIET_INSTALL, tests) \ $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests'; \ - $(INSTALL) tests/attr.py '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests'; \ + $(INSTALL) tests/attr.py -m 644 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests'; \ $(INSTALL) tests/pe-file.exe* '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests'; \ $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests/attr'; \ - $(INSTALL) tests/attr/* '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests/attr'; \ + $(INSTALL) tests/attr/* -m 644 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests/attr'; \ $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests/shell'; \ $(INSTALL) tests/shell/*.sh '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests/shell'; \ $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests/shell/lib'; \ - $(INSTALL) tests/shell/lib/*.sh '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests/shell/lib'; \ - $(INSTALL) tests/shell/lib/*.py '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests/shell/lib' + $(INSTALL) tests/shell/lib/*.sh -m 644 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests/shell/lib'; \ + $(INSTALL) tests/shell/lib/*.py -m 644 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests/shell/lib' install-bin: install-tools install-tests install-traceevent-plugins -- cgit v1.2.3 From faf59ec8c3c3708c64ff76b50e6f757c6b4a1054 Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Wed, 7 Sep 2022 19:24:58 +0300 Subject: perf record: Fix synthesis failure warnings Some calls to synthesis functions set err < 0 but only warn about the failure and continue. However they do not set err back to zero, relying on subsequent code to do that. That changed with the introduction of option --synth. When --synth=no subsequent functions that set err back to zero are not called. Fix by setting err = 0 in those cases. Example: Before: $ perf record --no-bpf-event --synth=all -o /tmp/huh uname Couldn't synthesize bpf events. Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.014 MB /tmp/huh (7 samples) ] $ perf record --no-bpf-event --synth=no -o /tmp/huh uname Couldn't synthesize bpf events. After: $ perf record --no-bpf-event --synth=no -o /tmp/huh uname Couldn't synthesize bpf events. Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.014 MB /tmp/huh (7 samples) ] Fixes: 41b740b6e8a994e5 ("perf record: Add --synth option") Signed-off-by: Adrian Hunter Acked-by: Namhyung Kim Cc: Ian Rogers Cc: Jiri Olsa Cc: Namhyung Kim Link: https://lore.kernel.org/r/20220907162458.72817-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/builtin-record.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index 09b68d76bbdc..f87ef43eb820 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -1906,14 +1906,18 @@ static int record__synthesize(struct record *rec, bool tail) err = perf_event__synthesize_bpf_events(session, process_synthesized_event, machine, opts); - if (err < 0) + if (err < 0) { pr_warning("Couldn't synthesize bpf events.\n"); + err = 0; + } if (rec->opts.synth & PERF_SYNTH_CGROUP) { err = perf_event__synthesize_cgroups(tool, process_synthesized_event, machine); - if (err < 0) + if (err < 0) { pr_warning("Couldn't synthesize cgroup events.\n"); + err = 0; + } } if (rec->opts.nr_threads_synthesize > 1) { -- cgit v1.2.3