Merge tag 'perf-tools-for-v6.0-2022-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools updates from Arnaldo Carvalho de Melo: - Introduce 'perf lock contention' subtool, using new lock contention tracepoints and using BPF for in kernel aggregation and then userspace processing using the perf tooling infrastructure for resolving symbols, target specification, etc. Since the new lock contention tracepoints don't provide lock names, get up to 8 stack traces and display the first non-lock function symbol name as a caller: $ perf lock report -F acquired,contended,avg_wait,wait_total Name acquired contended avg wait total wait update_blocked_a... 40 40 3.61 us 144.45 us kernfs_fop_open+... 5 5 3.64 us 18.18 us _nohz_idle_balance 3 3 2.65 us 7.95 us tick_do_update_j... 1 1 6.04 us 6.04 us ep_scan_ready_list 1 1 3.93 us 3.93 us Supports the usual 'perf record' + 'perf report' workflow as well as a BCC/bpftrace like mode where you start the tool and then press control+C to get results: $ sudo perf lock contention -b ^C contended total wait max wait avg wait type caller 42 192.67 us 13.64 us 4.59 us spinlock queue_work_on+0x20 23 85.54 us 10.28 us 3.72 us spinlock worker_thread+0x14a 6 13.92 us 6.51 us 2.32 us mutex kernfs_iop_permission+0x30 3 11.59 us 10.04 us 3.86 us mutex kernfs_dop_revalidate+0x3c 1 7.52 us 7.52 us 7.52 us spinlock kthread+0x115 1 7.24 us 7.24 us 7.24 us rwlock:W sys_epoll_wait+0x148 2 7.08 us 3.99 us 3.54 us spinlock delayed_work_timer_fn+0x1b 1 6.41 us 6.41 us 6.41 us spinlock idle_balance+0xa06 2 2.50 us 1.83 us 1.25 us mutex kernfs_iop_lookup+0x2f 1 1.71 us 1.71 us 1.71 us mutex kernfs_iop_getattr+0x2c ... - Add new 'perf kwork' tool to trace time properties of kernel work (such as softirq, and workqueue), uses eBPF skeletons to collect info in kernel space, aggregating data that then gets processed by the userspace tool, e.g.: # perf kwork report Kwork Name | Cpu | Total Runtime | Count | Max runtime | Max runtime start | Max runtime end | ---------------------------------------------------------------------------------------------------- nvme0q5:130 | 004 | 1.101 ms | 49 | 0.051 ms | 26035.056403 s | 26035.056455 s | amdgpu:162 | 002 | 0.176 ms | 9 | 0.046 ms | 26035.268020 s | 26035.268066 s | nvme0q24:149 | 023 | 0.161 ms | 55 | 0.009 ms | 26035.655280 s | 26035.655288 s | nvme0q20:145 | 019 | 0.090 ms | 33 | 0.014 ms | 26035.939018 s | 26035.939032 s | nvme0q31:156 | 030 | 0.075 ms | 21 | 0.010 ms | 26035.052237 s | 26035.052247 s | nvme0q8:133 | 007 | 0.062 ms | 12 | 0.021 ms | 26035.416840 s | 26035.416861 s | nvme0q6:131 | 005 | 0.054 ms | 22 | 0.010 ms | 26035.199919 s | 26035.199929 s | nvme0q19:144 | 018 | 0.052 ms | 14 | 0.010 ms | 26035.110615 s | 26035.110625 s | nvme0q7:132 | 006 | 0.049 ms | 13 | 0.007 ms | 26035.125180 s | 26035.125187 s | nvme0q18:143 | 017 | 0.033 ms | 14 | 0.007 ms | 26035.169698 s | 26035.169705 s | nvme0q17:142 | 016 | 0.013 ms | 1 | 0.013 ms | 26035.565147 s | 26035.565160 s | enp5s0-rx-0:164 | 006 | 0.004 ms | 4 | 0.002 ms | 26035.928882 s | 26035.928884 s | enp5s0-tx-0:166 | 008 | 0.003 ms | 3 | 0.002 ms | 26035.870923 s | 26035.870925 s | -------------------------------------------------------------------------------------------------------- See commit log messages for more examples with extra options to limit the events time window, etc. - Add support for new AMD IBS (Instruction Based Sampling) features: With the DataSrc extensions, the source of data can be decoded among: - Local L3 or other L1/L2 in CCX. - A peer cache in a near CCX. - Data returned from DRAM. - A peer cache in a far CCX. - DRAM address map with "long latency" bit set. - Data returned from MMIO/Config/PCI/APIC. - Extension Memory (S-Link, GenZ, etc - identified by the CS target and/or address map at DF's choice). - Peer Agent Memory. - Support hardware tracing with Intel PT on guest machines, combining the traces with the ones in the host machine. - Add a "-m" option to 'perf buildid-list' to show kernel and modules build-ids, to display all of the information needed to do external symbolization of kernel stack traces, such as those collected by bpf_get_stackid(). - Add arch TSC frequency information to perf.data file headers. - Handle changes in the binutils disassembler function signatures in perf, bpftool and bpf_jit_disasm (Acked by the bpftool maintainer). - Fix building the perf perl binding with the newest gcc in distros such as fedora rawhide, where some new warnings were breaking the build as perf uses -Werror. - Add 'perf test' entry for branch stack sampling. - Add ARM SPE system wide 'perf test' entry. - Add user space counter reading tests to 'perf test'. - Build with python3 by default, if available. - Add python converter script for the vendor JSON event files. - Update vendor JSON files for most Intel cores. - Add vendor JSON File for Intel meteorlake. - Add Arm Cortex-A78C and X1C JSON vendor event files. - Add workaround to symbol address reading from ELF files without phdr, falling back to the previoous equation. - Convert legacy map definition to BTF-defined in the perf BPF script test. - Rework prologue generation code to stop using libbpf deprecated APIs. - Add default hybrid events for 'perf stat' on x86. - Add topdown metrics in the default 'perf stat' on the hybrid machines (big/little cores). - Prefer sampled CPU when exporting JSON in 'perf data convert' - Fix ('perf stat CSV output linter') and ("Check branch stack sampling") 'perf test' entries on s390. * tag 'perf-tools-for-v6.0-2022-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (169 commits) perf stat: Refactor __run_perf_stat() common code perf lock: Print the number of lost entries for BPF perf lock: Add --map-nr-entries option perf lock: Introduce struct lock_contention perf scripting python: Do not build fail on deprecation warnings genelf: Use HAVE_LIBCRYPTO_SUPPORT, not the never defined HAVE_LIBCRYPTO perf build: Suppress openssl v3 deprecation warnings in libcrypto feature test perf parse-events: Break out tracepoint and printing perf parse-events: Don't #define YY_EXTRA_TYPE tools bpftool: Don't display disassembler-four-args feature test tools bpftool: Fix compilation error with new binutils tools bpf_jit_disasm: Don't display disassembler-four-args feature test tools bpf_jit_disasm: Fix compilation error with new binutils tools perf: Fix compilation error with new binutils tools include: add dis-asm-compat.h to handle version differences tools build: Don't display disassembler-four-args feature test tools build: Add feature test for init_disassemble_info API changes perf test: Add ARM SPE system wide test perf tools: Rework prologue generation code perf bpf: Convert legacy map definition to BTF-defined ...
author: Linus Torvalds <torvalds@linux-foundation.org> 2022-08-06 09:36:08 -0700
committer: Linus Torvalds <torvalds@linux-foundation.org> 2022-08-06 09:36:08 -0700
commit: 48a577dc1b09c1d35f2b8b37e7fa9a7169d50f5d (patch)
tree: bdb0ea12938bce19b48fe304c37be24b55e67ebe /tools/perf/pmu-events/arch/x86/sapphirerapids/pipeline.json
parent: 033a94412b6065a21c2ede2f37867e747a84563f (diff)
parent: bb8bc52e75785af94b9ba079277547d50d018a52 (diff)
download: linux-48a577dc1b09c1d35f2b8b37e7fa9a7169d50f5d.tar.bz2
1 files changed, 94 insertions, 5 deletions
diff --git a/tools/perf/pmu-events/arch/x86/sapphirerapids/pipeline.json b/tools/perf/pmu-events/arch/x86/sapphirerapids/pipeline.json
index b0920f5b25ed..df4f3d714e6e 100644
--- a/tools/perf/pmu-events/arch/x86/sapphirerapids/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/sapphirerapids/pipeline.json
@@ -22,6 +22,7 @@
         "EventName": "ARITH.DIVIDER_ACTIVE",
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "SampleAfterValue": "1000003",
+        "Speculative": "1",
         "UMask": "0x9"
     },
     {
@@ -34,6 +35,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Counts cycles when divide unit is busy executing divide or square root operations. Accounts for integer and floating-point operations.",
         "SampleAfterValue": "1000003",
+        "Speculative": "1",
         "UMask": "0x9"
     },
     {
@@ -45,6 +47,7 @@
         "EventName": "ARITH.FP_DIVIDER_ACTIVE",
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "SampleAfterValue": "1000003",
+        "Speculative": "1",
         "UMask": "0x1"
     },
     {
@@ -54,8 +57,8 @@
         "EventCode": "0xb0",
         "EventName": "ARITH.IDIV_ACTIVE",
         "PEBScounters": "0,1,2,3,4,5,6,7",
-        "PublicDescription": "ARITH.IDIV_ACTIVE",
         "SampleAfterValue": "1000003",
+        "Speculative": "1",
         "UMask": "0x8"
     },
     {
@@ -67,6 +70,7 @@
         "EventName": "ARITH.INT_DIVIDER_ACTIVE",
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "SampleAfterValue": "1000003",
+        "Speculative": "1",
         "UMask": "0x8"
     },
     {
@@ -78,6 +82,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Counts the number of occurrences where a microcode assist is invoked by hardware Examples include AD (page Access Dirty), FP and AVX related assists.",
         "SampleAfterValue": "100003",
+        "Speculative": "1",
         "UMask": "0x1f"
     },
     {
@@ -223,7 +228,7 @@
         "UMask": "0x10"
     },
     {
-        "BriefDescription": "number of branch instructions retired that were mispredicted and taken. Non PEBS",
+        "BriefDescription": "number of branch instructions retired that were mispredicted and taken.",
         "CollectPEBSRecord": "2",
         "Counter": "0,1,2,3,4,5,6,7",
         "EventCode": "0xc5",
@@ -291,6 +296,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Counts core clocks when the thread is in the C0.1 light-weight slower wakeup time but more power saving optimized state.  This state can be entered via the TPAUSE or UMWAIT instructions.",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x10"
     },
     {
@@ -302,6 +308,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Counts core clocks when the thread is in the C0.2 light-weight faster wakeup time but less power saving optimized state.  This state can be entered via the TPAUSE or UMWAIT instructions.",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x20"
     },
     {
@@ -313,6 +320,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Counts core clocks when the thread is in the C0.1 or C0.2 power saving optimized states (TPAUSE or UMWAIT instructions) or running the PAUSE instruction.",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x70"
     },
     {
@@ -324,6 +332,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "This event distributes cycle counts between active hyperthreads, i.e., those in C0.  A hyperthread becomes inactive when it executes the HLT or MWAIT instructions.  If all other hyperthreads are inactive (or disabled or do not exist), all counts are attributed to this hyperthread. To obtain the full count when the Core is active, sum the counts from each hyperthread.",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x2"
     },
     {
@@ -335,6 +344,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Counts Core crystal clock cycles when current thread is unhalted and the other thread is halted.",
         "SampleAfterValue": "25003",
+        "Speculative": "1",
         "UMask": "0x2"
     },
     {
@@ -345,6 +355,7 @@
         "EventName": "CPU_CLK_UNHALTED.PAUSE",
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x40"
     },
     {
@@ -356,6 +367,7 @@
         "EventName": "CPU_CLK_UNHALTED.PAUSE_INST",
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x40"
     },
     {
@@ -366,6 +378,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "This event distributes Core crystal clock cycle counts between active hyperthreads, i.e., those in C0 sleep-state. A hyperthread becomes inactive when it executes the HLT or MWAIT instructions. If one thread is active in a core, all counts are attributed to this hyperthread. To obtain the full count when the Core is active, sum the counts from each hyperthread.",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x8"
     },
     {
@@ -376,9 +389,22 @@
         "PEBScounters": "34",
         "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x3"
     },
     {
+        "BriefDescription": "Reference cycles when the core is not in halt state.",
+        "CollectPEBSRecord": "2",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "EventCode": "0x3c",
+        "EventName": "CPU_CLK_UNHALTED.REF_TSC_P",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
+        "SampleAfterValue": "2000003",
+        "Speculative": "1",
+        "UMask": "0x1"
+    },
+    {
         "BriefDescription": "Core cycles when the thread is not in halt state",
         "CollectPEBSRecord": "2",
         "Counter": "Fixed counter 1",
@@ -386,6 +412,7 @@
         "PEBScounters": "33",
         "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events.",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x2"
     },
     {
@@ -396,7 +423,8 @@
         "EventName": "CPU_CLK_UNHALTED.THREAD_P",
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "This is an architectural event that counts the number of thread cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. The core frequency may change from time to time due to power or thermal throttling. For this reason, this event may have a changing ratio with regards to wall clock time.",
-        "SampleAfterValue": "2000003"
+        "SampleAfterValue": "2000003",
+        "Speculative": "1"
     },
     {
         "BriefDescription": "Cycles while L1 cache miss demand load is outstanding.",
@@ -407,6 +435,7 @@
         "EventName": "CYCLE_ACTIVITY.CYCLES_L1D_MISS",
         "PEBScounters": "0,1,2,3",
         "SampleAfterValue": "1000003",
+        "Speculative": "1",
         "UMask": "0x8"
     },
     {
@@ -418,6 +447,7 @@
         "EventName": "CYCLE_ACTIVITY.CYCLES_L2_MISS",
         "PEBScounters": "0,1,2,3",
         "SampleAfterValue": "1000003",
+        "Speculative": "1",
         "UMask": "0x1"
     },
     {
@@ -429,6 +459,7 @@
         "EventName": "CYCLE_ACTIVITY.CYCLES_MEM_ANY",
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "SampleAfterValue": "1000003",
+        "Speculative": "1",
         "UMask": "0x10"
     },
     {
@@ -440,6 +471,7 @@
         "EventName": "CYCLE_ACTIVITY.STALLS_L1D_MISS",
         "PEBScounters": "0,1,2,3",
         "SampleAfterValue": "1000003",
+        "Speculative": "1",
         "UMask": "0xc"
     },
     {
@@ -451,6 +483,7 @@
         "EventName": "CYCLE_ACTIVITY.STALLS_L2_MISS",
         "PEBScounters": "0,1,2,3",
         "SampleAfterValue": "1000003",
+        "Speculative": "1",
         "UMask": "0x5"
     },
     {
@@ -462,6 +495,7 @@
         "EventName": "CYCLE_ACTIVITY.STALLS_TOTAL",
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "SampleAfterValue": "1000003",
+        "Speculative": "1",
         "UMask": "0x4"
     },
     {
@@ -473,6 +507,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Counts cycles during which a total of 1 uop was executed on all ports and Reservation Station (RS) was not empty.",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x2"
     },
     {
@@ -484,6 +519,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Counts cycles during which a total of 2 uops were executed on all ports and Reservation Station (RS) was not empty.",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x4"
     },
     {
@@ -495,6 +531,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Cycles total of 3 uops are executed on all ports and Reservation Station (RS) was not empty.",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x8"
     },
     {
@@ -506,6 +543,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Cycles total of 4 uops are executed on all ports and Reservation Station (RS) was not empty.",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x10"
     },
     {
@@ -517,6 +555,7 @@
         "EventName": "EXE_ACTIVITY.BOUND_ON_LOADS",
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x21"
     },
     {
@@ -529,6 +568,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Counts cycles where the Store Buffer was full and no loads caused an execution stall.",
         "SampleAfterValue": "1000003",
+        "Speculative": "1",
         "UMask": "0x40"
     },
     {
@@ -540,6 +580,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Number of cycles total of 0 uops executed on all ports, Reservation Station (RS) was not empty, the Store Buffer (SB) was not full and there was no outstanding load.",
         "SampleAfterValue": "1000003",
+        "Speculative": "1",
         "UMask": "0x80"
     },
     {
@@ -551,6 +592,7 @@
         "PEBScounters": "0,1,2,3",
         "PublicDescription": "Number of decoders utilized in a cycle when the MITE (legacy decode pipeline) fetches instructions.",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x1"
     },
     {
@@ -586,12 +628,13 @@
         "UMask": "0x10"
     },
     {
-        "BriefDescription": "Number of all retired NOP instructions.",
+        "BriefDescription": "Retired NOP instructions.",
         "CollectPEBSRecord": "2",
         "Counter": "0,1,2,3,4,5,6,7",
         "EventCode": "0xc0",
         "EventName": "INST_RETIRED.NOP",
         "PEBScounters": "1,2,3,4,5,6,7",
+        "PublicDescription": "Counts all retired NOP or ENDBR32/64 instructions",
         "SampleAfterValue": "2000003",
         "UMask": "0x2"
     },
@@ -625,6 +668,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Cycles after recovery from a branch misprediction or machine clear till the first uop is issued from the resteered path.",
         "SampleAfterValue": "500009",
+        "Speculative": "1",
         "UMask": "0x80"
     },
     {
@@ -634,6 +678,7 @@
         "EventName": "INT_MISC.MBA_STALLS",
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "SampleAfterValue": "1000003",
+        "Speculative": "1",
         "UMask": "0x20"
     },
     {
@@ -645,6 +690,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Counts core cycles when the Resource allocator was stalled due to recovery from an earlier branch misprediction or machine clear event.",
         "SampleAfterValue": "500009",
+        "Speculative": "1",
         "UMask": "0x1"
     },
     {
@@ -657,6 +703,7 @@
         "MSRValue": "0x7",
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "SampleAfterValue": "1000003",
+        "Speculative": "1",
         "TakenAlone": "1",
         "UMask": "0x40"
     },
@@ -669,6 +716,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Estimated number of Top-down Microarchitecture Analysis slots that got dropped due to non front-end reasons",
         "SampleAfterValue": "1000003",
+        "Speculative": "1",
         "UMask": "0x10"
     },
     {
@@ -762,6 +810,7 @@
         "PEBScounters": "0,1,2,3",
         "PublicDescription": "Counts the number of times a load got blocked due to false dependencies in MOB due to partial compare on address.",
         "SampleAfterValue": "100003",
+        "Speculative": "1",
         "UMask": "0x4"
     },
     {
@@ -773,6 +822,7 @@
         "PEBScounters": "0,1,2,3",
         "PublicDescription": "Counts the number of times that split load operations are temporarily blocked because all resources for handling the split accesses are in use.",
         "SampleAfterValue": "100003",
+        "Speculative": "1",
         "UMask": "0x88"
     },
     {
@@ -784,6 +834,7 @@
         "PEBScounters": "0,1,2,3",
         "PublicDescription": "Counts the number of times where store forwarding was prevented for a load operation. The most common case is a load blocked due to the address of memory access (partially) overlapping with a preceding uncompleted store. Note: See the table of not supported store forwards in the Optimization Guide.",
         "SampleAfterValue": "100003",
+        "Speculative": "1",
         "UMask": "0x82"
     },
     {
@@ -795,6 +846,7 @@
         "PEBScounters": "0,1,2,3",
         "PublicDescription": "Counts all not software-prefetch load dispatches that hit the fill buffer (FB) allocated for the software prefetch. It can also be incremented by some lock instructions. So it should only be used with profiling so that the locks can be excluded by ASM (Assembly File) inspection of the nearby instructions.",
         "SampleAfterValue": "100003",
+        "Speculative": "1",
         "UMask": "0x1"
     },
     {
@@ -807,6 +859,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Counts the cycles when at least one uop is delivered by the LSD (Loop-stream detector).",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x1"
     },
     {
@@ -819,6 +872,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Counts the cycles when optimal number of uops is delivered by the LSD (Loop-stream detector).",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x1"
     },
     {
@@ -830,6 +884,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Counts the number of uops delivered to the back-end by the LSD(Loop Stream Detector).",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x1"
     },
     {
@@ -843,6 +898,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Counts the number of machine clears (nukes) of any type.",
         "SampleAfterValue": "100003",
+        "Speculative": "1",
         "UMask": "0x1"
     },
     {
@@ -854,6 +910,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Counts self-modifying code (SMC) detected, which causes a machine clear.",
         "SampleAfterValue": "100003",
+        "Speculative": "1",
         "UMask": "0x4"
     },
     {
@@ -864,6 +921,7 @@
         "EventName": "MISC2_RETIRED.LFENCE",
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "SampleAfterValue": "400009",
+        "Speculative": "1",
         "UMask": "0x20"
     },
     {
@@ -886,6 +944,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Counts allocation stall cycles caused by the store buffer (SB) being full. This counts cycles that the pipeline back-end blocked uop delivery from the front-end.",
         "SampleAfterValue": "100003",
+        "Speculative": "1",
         "UMask": "0x8"
     },
     {
@@ -896,6 +955,7 @@
         "EventName": "RESOURCE_STALLS.SCOREBOARD",
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "SampleAfterValue": "100003",
+        "Speculative": "1",
         "UMask": "0x2"
     },
     {
@@ -907,6 +967,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Number of slots in TMA method where no micro-operations were being issued from front-end to back-end of the machine due to lack of back-end resources.",
         "SampleAfterValue": "10000003",
+        "Speculative": "1",
         "UMask": "0x2"
     },
     {
@@ -916,6 +977,7 @@
         "EventName": "TOPDOWN.BAD_SPEC_SLOTS",
         "PublicDescription": "Number of slots of TMA method that were wasted due to incorrect speculation. It covers all types of control-flow or data-related mis-speculations.",
         "SampleAfterValue": "10000003",
+        "Speculative": "1",
         "UMask": "0x4"
     },
     {
@@ -925,6 +987,7 @@
         "EventName": "TOPDOWN.BR_MISPREDICT_SLOTS",
         "PublicDescription": "Number of TMA slots that were wasted due to incorrect speculation by (any type of) branch mispredictions. This event estimates number of specualtive operations that were issued but not retired as well as the out-of-order engine recovery past a branch misprediction.",
         "SampleAfterValue": "10000003",
+        "Speculative": "1",
         "UMask": "0x8"
     },
     {
@@ -935,6 +998,7 @@
         "EventName": "TOPDOWN.MEMORY_BOUND_SLOTS",
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "SampleAfterValue": "10000003",
+        "Speculative": "1",
         "UMask": "0x10"
     },
     {
@@ -945,6 +1009,7 @@
         "PEBScounters": "35",
         "PublicDescription": "Number of available slots for an unhalted logical processor. The event increments by machine-width of the narrowest pipeline as employed by the Top-down Microarchitecture Analysis method (TMA). The count is distributed among unhalted logical processors (hyper-threads) who share the same physical core. Software can use this event as the denominator for the top-level metrics of the TMA method. This architectural event is counted on a designated fixed counter (Fixed Counter 3).",
         "SampleAfterValue": "10000003",
+        "Speculative": "1",
         "UMask": "0x4"
     },
     {
@@ -956,6 +1021,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Counts the number of available slots for an unhalted logical processor. The event increments by machine-width of the narrowest pipeline as employed by the Top-down Microarchitecture Analysis method. The count is distributed among unhalted logical processors (hyper-threads) who share the same physical core.",
         "SampleAfterValue": "10000003",
+        "Speculative": "1",
         "UMask": "0x1"
     },
     {
@@ -966,6 +1032,7 @@
         "EventName": "UOPS_DECODED.DEC0_UOPS",
         "PEBScounters": "0,1,2,3",
         "SampleAfterValue": "1000003",
+        "Speculative": "1",
         "UMask": "0x1"
     },
     {
@@ -977,6 +1044,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Number of uops dispatch to execution  port 0.",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x1"
     },
     {
@@ -988,6 +1056,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Number of uops dispatch to execution  port 1.",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x2"
     },
     {
@@ -999,6 +1068,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Number of uops dispatch to execution ports 2, 3 and 10",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x4"
     },
     {
@@ -1010,6 +1080,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Number of uops dispatch to execution ports 4 and 9",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x10"
     },
     {
@@ -1021,6 +1092,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Number of uops dispatch to execution ports 5 and 11",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x20"
     },
     {
@@ -1032,6 +1104,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Number of uops dispatch to execution  port 6.",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x40"
     },
     {
@@ -1043,6 +1116,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Number of uops dispatch to execution  ports 7 and 8.",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x80"
     },
     {
@@ -1054,6 +1128,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Counts the number of uops executed from any thread.",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x2"
     },
     {
@@ -1066,6 +1141,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Counts cycles when at least 1 micro-op is executed from any thread on physical core.",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x2"
     },
     {
@@ -1078,6 +1154,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Counts cycles when at least 2 micro-ops are executed from any thread on physical core.",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x2"
     },
     {
@@ -1090,6 +1167,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Counts cycles when at least 3 micro-ops are executed from any thread on physical core.",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x2"
     },
     {
@@ -1102,6 +1180,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Counts cycles when at least 4 micro-ops are executed from any thread on physical core.",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x2"
     },
     {
@@ -1114,6 +1193,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Cycles where at least 1 uop was executed per-thread.",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x1"
     },
     {
@@ -1126,6 +1206,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Cycles where at least 2 uops were executed per-thread.",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x1"
     },
     {
@@ -1138,6 +1219,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Cycles where at least 3 uops were executed per-thread.",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x1"
     },
     {
@@ -1150,6 +1232,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Cycles where at least 4 uops were executed per-thread.",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x1"
     },
     {
@@ -1163,6 +1246,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Counts cycles during which no uops were dispatched from the Reservation Station (RS) per thread.",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x1"
     },
     {
@@ -1175,6 +1259,7 @@
         "Invert": "1",
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x1"
     },
     {
@@ -1185,6 +1270,7 @@
         "EventName": "UOPS_EXECUTED.THREAD",
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x1"
     },
     {
@@ -1196,6 +1282,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Counts the number of x87 uops executed.",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x10"
     },
     {
@@ -1207,6 +1294,7 @@
         "PEBScounters": "0,1,2,3,4,5,6,7",
         "PublicDescription": "Counts the number of uops that the Resource Allocation Table (RAT) issues to the Reservation Station (RS).",
         "SampleAfterValue": "2000003",
+        "Speculative": "1",
         "UMask": "0x1"
     },
     {
@@ -1222,12 +1310,13 @@
         "UMask": "0x2"
     },
     {
-        "BriefDescription": "UOPS_RETIRED.HEAVY",
+        "BriefDescription": "Retired uops except the last uop of each instruction.",
         "CollectPEBSRecord": "2",
         "Counter": "0,1,2,3,4,5,6,7",
         "EventCode": "0xc2",
         "EventName": "UOPS_RETIRED.HEAVY",
         "PEBScounters": "0,1,2,3,4,5,6,7",
+        "PublicDescription": "Counts the number of retired micro-operations (uops) except the last uop of each instruction. An instruction that is decoded into less than two uops does not contribute to the count.",
         "SampleAfterValue": "2000003",
         "UMask": "0x1"
     },
author	Linus Torvalds <torvalds@linux-foundation.org>	2022-08-06 09:36:08 -0700
committer	Linus Torvalds <torvalds@linux-foundation.org>	2022-08-06 09:36:08 -0700
commit	48a577dc1b09c1d35f2b8b37e7fa9a7169d50f5d (patch)
tree	bdb0ea12938bce19b48fe304c37be24b55e67ebe /tools/perf/pmu-events/arch/x86/sapphirerapids/pipeline.json
parent	033a94412b6065a21c2ede2f37867e747a84563f (diff)
parent	bb8bc52e75785af94b9ba079277547d50d018a52 (diff)
download	linux-48a577dc1b09c1d35f2b8b37e7fa9a7169d50f5d.tar.bz2