summaryrefslogtreecommitdiffstats
path: root/arch/powerpc/perf/core-book3s.c
AgeCommit message (Collapse)AuthorFilesLines
2022-10-10Merge tag 'perf-core-2022-10-07' of ↵Linus Torvalds1-3/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events updates from Ingo Molnar: "PMU driver updates: - Add AMD Last Branch Record Extension Version 2 (LbrExtV2) feature support for Zen 4 processors. - Extend the perf ABI to provide branch speculation information, if available, and use this on CPUs that have it (eg. LbrExtV2). - Improve Intel PEBS TSC timestamp handling & integration. - Add Intel Raptor Lake S CPU support. - Add 'perf mem' and 'perf c2c' memory profiling support on AMD CPUs by utilizing IBS tagged load/store samples. - Clean up & optimize various x86 PMU details. HW breakpoints: - Big rework to optimize the code for systems with hundreds of CPUs and thousands of breakpoints: - Replace the nr_bp_mutex global mutex with the bp_cpuinfo_sem per-CPU rwsem that is read-locked during most of the key operations. - Improve the O(#cpus * #tasks) logic in toggle_bp_slot() and fetch_bp_busy_slots(). - Apply micro-optimizations & cleanups. - Misc cleanups & enhancements" * tag 'perf-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits) perf/hw_breakpoint: Annotate tsk->perf_event_mutex vs ctx->mutex perf: Fix pmu_filter_match() perf: Fix lockdep_assert_event_ctx() perf/x86/amd/lbr: Adjust LBR regardless of filtering perf/x86/utils: Fix uninitialized var in get_branch_type() perf/uapi: Define PERF_MEM_SNOOPX_PEER in kernel header file perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR perf/x86/amd: Support PERF_SAMPLE_ADDR perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT} perf/x86/amd: Support PERF_SAMPLE_DATA_SRC perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO} perf/x86/uncore: Add new Raptor Lake S support perf/x86/cstate: Add new Raptor Lake S support perf/x86/msr: Add new Raptor Lake S support perf/x86: Add new Raptor Lake S support bpf: Check flags for branch stack in bpf_read_branch_records helper perf, hw_breakpoint: Fix use-after-free if perf_event_open() fails perf: Use sample_flags for raw_data perf: Use sample_flags for addr ...
2022-09-28powerpc/perf: Fix branch_filter support for multiple filtersAthira Rajeev1-0/+17
For PERF_SAMPLE_BRANCH_STACK sample type, different branch_sample_type ie branch filters are supported. The branch filters are requested via event attribute "branch_sample_type". Multiple branch filters can be passed in event attribute. eg: $ perf record -b -o- -B --branch-filter any,ind_call true None of the Power PMUs support having multiple branch filters at the same time. Branch filters for branch stack sampling is set via MMCRA IFM bits [32:33]. But currently when requesting for multiple filter types, the "perf record" command does not report any error. eg: $ perf record -b -o- -B --branch-filter any,save_type true $ perf record -b -o- -B --branch-filter any,ind_call true The "bhrb_filter_map" function in PMU driver code does the validity check for supported branch filters. But this check is done for single filter. Hence "perf record" will proceed here without reporting any error. Fix power_pmu_event_init() to return EOPNOTSUPP when multiple branch filters are requested in the event attr. After the fix: $ perf record --branch-filter any,ind_call -- ls Error: cycles: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat' Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Disha Goel<disgoel@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Reviewed-by: Kajol Jain <kjain@linux.ibm.com> [mpe: Tweak comment and change log wording] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921145255.20972-1-atrajeev@linux.vnet.ibm.com
2022-09-06perf: Use sample_flags for data_srcKan Liang1-1/+3
Use the new sample_flags to indicate whether the data_src field is filled by the PMU driver. Remove the data_src field from the perf_sample_data_init() to minimize the number of cache lines touched. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220901130959.1285717-6-kan.liang@linux.intel.com
2022-09-06perf: Use sample_flags for weightKan Liang1-2/+3
Use the new sample_flags to indicate whether the weight field is filled by the PMU driver. Remove the weight field from the perf_sample_data_init() to minimize the number of cache lines touched. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220901130959.1285717-5-kan.liang@linux.intel.com
2022-09-06perf: Use sample_flags for branch stackKan Liang1-0/+1
Use the new sample_flags to indicate whether the branch stack is filled by the PMU driver. Remove the br_stack from the perf_sample_data_init() to minimize the number of cache lines touched. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220901130959.1285717-4-kan.liang@linux.intel.com
2022-07-18powerpc/perf: Add support for caps under sysfs in powerpcAthira Rajeev1-0/+31
Add caps support under "/sys/bus/event_source/devices/<pmu>/" for powerpc. This directory can be used to expose some of the specific features that powerpc PMU supports to the user. Example: pmu_name. The name of PMU registered will depend on platform, say power9 or power10 or it could be Generic Compat PMU. Currently the only way to know which is the registered PMU is from the dmesg logs. But clearing the dmesg will make it difficult to know exact PMU backend used. And even extracting from dmesg will be complicated, as we need to parse the dmesg logs and add filters for pmu name. Whereas by exposing it via caps will make it easy as we just need to directly read it from the sysfs. Add a caps directory to /sys/bus/event_source/devices/cpu/ for power8, power9, power10 and generic compat PMU in respective PMU driver code. Update the pmu_name file under caps folder in core-book3s using "attr_update". The information exposed currently: - pmu_name : Underlying PMU name from the driver Example result with power9 pmu: # ls /sys/bus/event_source/devices/cpu/caps pmu_name # cat /sys/bus/event_source/devices/cpu/caps/pmu_name POWER9 Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220520084630.15181-1-atrajeev@linux.vnet.ibm.com
2022-06-28powerpc/perf: Optimize clearing the pending PMI and remove WARN_ON for PMI ↵Athira Rajeev1-20/+15
check in power_pmu_disable commit 2c9ac51b850d ("powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC") added a new function "pmi_irq_pending" in hw_irq.h. This function is to check if there is a PMI marked as pending in Paca (PACA_IRQ_PMI).This is used in power_pmu_disable in a WARN_ON. The intention here is to provide a warning if there is PMI pending, but no counter is found overflown. During some of the perf runs, below warning is hit: WARNING: CPU: 36 PID: 0 at arch/powerpc/perf/core-book3s.c:1332 power_pmu_disable+0x25c/0x2c0 Modules linked in: ----- NIP [c000000000141c3c] power_pmu_disable+0x25c/0x2c0 LR [c000000000141c8c] power_pmu_disable+0x2ac/0x2c0 Call Trace: [c000000baffcfb90] [c000000000141c8c] power_pmu_disable+0x2ac/0x2c0 (unreliable) [c000000baffcfc10] [c0000000003e2f8c] perf_pmu_disable+0x4c/0x60 [c000000baffcfc30] [c0000000003e3344] group_sched_out.part.124+0x44/0x100 [c000000baffcfc80] [c0000000003e353c] __perf_event_disable+0x13c/0x240 [c000000baffcfcd0] [c0000000003dd334] event_function+0xc4/0x140 [c000000baffcfd20] [c0000000003d855c] remote_function+0x7c/0xa0 [c000000baffcfd50] [c00000000026c394] flush_smp_call_function_queue+0xd4/0x300 [c000000baffcfde0] [c000000000065b24] smp_ipi_demux_relaxed+0xa4/0x100 [c000000baffcfe20] [c0000000000cb2b0] xive_muxed_ipi_action+0x20/0x40 [c000000baffcfe40] [c000000000207c3c] __handle_irq_event_percpu+0x8c/0x250 [c000000baffcfee0] [c000000000207e2c] handle_irq_event_percpu+0x2c/0xa0 [c000000baffcff10] [c000000000210a04] handle_percpu_irq+0x84/0xc0 [c000000baffcff40] [c000000000205f14] generic_handle_irq+0x54/0x80 [c000000baffcff60] [c000000000015740] __do_irq+0x90/0x1d0 [c000000baffcff90] [c000000000016990] __do_IRQ+0xc0/0x140 [c0000009732f3940] [c000000bafceaca8] 0xc000000bafceaca8 [c0000009732f39d0] [c000000000016b78] do_IRQ+0x168/0x1c0 [c0000009732f3a00] [c0000000000090c8] hardware_interrupt_common_virt+0x218/0x220 This means that there is no PMC overflown among the active events in the PMU, but there is a PMU pending in Paca. The function "any_pmc_overflown" checks the PMCs on active events in cpuhw->n_events. Code snippet: <<>> if (any_pmc_overflown(cpuhw)) clear_pmi_irq_pending(); else WARN_ON(pmi_irq_pending()); <<>> Here the PMC overflown is not from active event. Example: When we do perf record, default cycles and instructions will be running on PMC6 and PMC5 respectively. It could happen that overflowed event is currently not active and pending PMI is for the inactive event. Debug logs from trace_printk: <<>> any_pmc_overflown: idx is 5: pmc value is 0xd9a power_pmu_disable: PMC1: 0x0, PMC2: 0x0, PMC3: 0x0, PMC4: 0x0, PMC5: 0xd9a, PMC6: 0x80002011 <<>> Here active PMC (from idx) is PMC5 , but overflown PMC is PMC6(0x80002011). When we handle PMI interrupt for such cases, if the PMC overflown is from inactive event, it will be ignored. Reference commit: commit bc09c219b2e6 ("powerpc/perf: Fix finding overflowed PMC in interrupt") Patch addresses two changes: 1) Fix 1 : Removal of warning ( WARN_ON(pmi_irq_pending()); ) We were printing warning if no PMC is found overflown among active PMU events, but PMI pending in PACA. But this could happen in cases where PMC overflown is not in active PMC. An inactive event could have caused the overflow. Hence the warning is not needed. To know pending PMI is from an inactive event, we need to loop through all PMC's which will cause more SPR reads via mfspr and increase in context switch. Also in existing function: perf_event_interrupt, already we ignore PMI's overflown when it is from an inactive PMC. 2) Fix 2: optimization in clearing pending PMI. Currently we check for any active PMC overflown before clearing PMI pending in Paca. This is causing additional SPR read also. From point 1, we know that if PMI pending in Paca from inactive cases, that is going to be ignored during replay. Hence if there is pending PMI in Paca, just clear it irrespective of PMC overflown or not. In summary, remove the any_pmc_overflown check entirely in power_pmu_disable. ie If there is a pending PMI in Paca, clear it, since we are in pmu_disable. There could be cases where PMI is pending because of inactive PMC ( which later when replayed also will get ignored ), so WARN_ON could give false warning. Hence removing it. Fixes: 2c9ac51b850d ("powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC") Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220522142256.24699-1-atrajeev@linux.vnet.ibm.com
2022-05-05powerpc: fix typos in commentsJulia Lawall1-3/+3
Various spelling mistakes in comments. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220430185654.5855-1-Julia.Lawall@inria.fr
2022-01-24powerpc/perf: Fix power_pmu_disable to call clear_pmi_irq_pending only if ↵Athira Rajeev1-3/+14
PMI is pending Running selftest with CONFIG_PPC_IRQ_SOFT_MASK_DEBUG enabled in kernel triggered below warning: [ 172.851380] ------------[ cut here ]------------ [ 172.851391] WARNING: CPU: 8 PID: 2901 at arch/powerpc/include/asm/hw_irq.h:246 power_pmu_disable+0x270/0x280 [ 172.851402] Modules linked in: dm_mod bonding nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables rfkill nfnetlink sunrpc xfs libcrc32c pseries_rng xts vmx_crypto uio_pdrv_genirq uio sch_fq_codel ip_tables ext4 mbcache jbd2 sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp fuse [ 172.851442] CPU: 8 PID: 2901 Comm: lost_exception_ Not tainted 5.16.0-rc5-03218-g798527287598 #2 [ 172.851451] NIP: c00000000013d600 LR: c00000000013d5a4 CTR: c00000000013b180 [ 172.851458] REGS: c000000017687860 TRAP: 0700 Not tainted (5.16.0-rc5-03218-g798527287598) [ 172.851465] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 48004884 XER: 20040000 [ 172.851482] CFAR: c00000000013d5b4 IRQMASK: 1 [ 172.851482] GPR00: c00000000013d5a4 c000000017687b00 c000000002a10600 0000000000000004 [ 172.851482] GPR04: 0000000082004000 c0000008ba08f0a8 0000000000000000 00000008b7ed0000 [ 172.851482] GPR08: 00000000446194f6 0000000000008000 c00000000013b118 c000000000d58e68 [ 172.851482] GPR12: c00000000013d390 c00000001ec54a80 0000000000000000 0000000000000000 [ 172.851482] GPR16: 0000000000000000 0000000000000000 c000000015d5c708 c0000000025396d0 [ 172.851482] GPR20: 0000000000000000 0000000000000000 c00000000a3bbf40 0000000000000003 [ 172.851482] GPR24: 0000000000000000 c0000008ba097400 c0000000161e0d00 c00000000a3bb600 [ 172.851482] GPR28: c000000015d5c700 0000000000000001 0000000082384090 c0000008ba0020d8 [ 172.851549] NIP [c00000000013d600] power_pmu_disable+0x270/0x280 [ 172.851557] LR [c00000000013d5a4] power_pmu_disable+0x214/0x280 [ 172.851565] Call Trace: [ 172.851568] [c000000017687b00] [c00000000013d5a4] power_pmu_disable+0x214/0x280 (unreliable) [ 172.851579] [c000000017687b40] [c0000000003403ac] perf_pmu_disable+0x4c/0x60 [ 172.851588] [c000000017687b60] [c0000000003445e4] __perf_event_task_sched_out+0x1d4/0x660 [ 172.851596] [c000000017687c50] [c000000000d1175c] __schedule+0xbcc/0x12a0 [ 172.851602] [c000000017687d60] [c000000000d11ea8] schedule+0x78/0x140 [ 172.851608] [c000000017687d90] [c0000000001a8080] sys_sched_yield+0x20/0x40 [ 172.851615] [c000000017687db0] [c0000000000334dc] system_call_exception+0x18c/0x380 [ 172.851622] [c000000017687e10] [c00000000000c74c] system_call_common+0xec/0x268 The warning indicates that MSR_EE being set(interrupt enabled) when there was an overflown PMC detected. This could happen in power_pmu_disable since it runs under interrupt soft disable condition ( local_irq_save ) and not with interrupts hard disabled. commit 2c9ac51b850d ("powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC") intended to clear PMI pending bit in Paca when disabling the PMU. It could happen that PMC gets overflown while code is in power_pmu_disable callback function. Hence add a check to see if PMI pending bit is set in Paca before clearing it via clear_pmi_pending. Fixes: 2c9ac51b850d ("powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC") Reported-by: Sachin Sant <sachinp@linux.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Sachin Sant <sachinp@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220122033429.25395-1-atrajeev@linux.vnet.ibm.com
2022-01-17powerpc/perf: Only define power_pmu_wants_prompt_pmi() for CONFIG_PPC64Athira Rajeev1-30/+28
power_pmu_wants_prompt_pmi() is used to decide if PMIs should be taken promptly. This is valid only for ppc64 and is used only if CONFIG_PPC_BOOK3S_64=y. Hence include the function under config check for PPC64. Fixes warning for 32-bit compilation: arch/powerpc/perf/core-book3s.c:2455:6: warning: no previous prototype for 'power_pmu_wants_prompt_pmi' 2455 | bool power_pmu_wants_prompt_pmi(void) | ^~~~~~~~~~~~~~~~~~~~~~~~~~ Fixes: 5a7745b96f43 ("powerpc/64s/perf: add power_pmu_wants_prompt_pmi to say whether perf wants PMIs to be soft-NMI") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Move inside existing CONFIG_PPC64 ifdef block] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220114031355.87480-1-atrajeev@linux.vnet.ibm.com
2021-12-23powerpc/perf: Add __init attribute to eligible functionsNick Child1-1/+1
Some functions defined in 'arch/powerpc/perf' are deserving of an `__init` macro attribute. These functions are only called by other initialization functions and therefore should inherit the attribute. Also, change function declarations in header files to include `__init`. Signed-off-by: Nick Child <nick.child@ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211216220035.605465-5-nick.child@ibm.com
2021-12-16powerpc/64s/perf: add power_pmu_wants_prompt_pmi to say whether perf wants ↵Nicholas Piggin1-0/+31
PMIs to be soft-NMI Interrupt code enables MSR[EE] in some irq handlers while keeping local irqs disabled via soft-mask, allowing PMI interrupts to be taken as soft-NMI to improve profiling of irq handlers. When perf is not enabled, there is no point to doing this, it's additional overhead. So provide a function that can say if PMIs should be taken promptly if possible. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210922145452.352571-4-npiggin@gmail.com
2021-11-30powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an ↵Athira Rajeev1-1/+57
overflown PMC Running perf fuzzer showed below in dmesg logs: "Can't find PMC that caused IRQ" This means a PMU exception happened, but none of the PMC's (Performance Monitor Counter) were found to be overflown. There are some corner cases that clears the PMCs after PMI gets masked. In such cases, the perf interrupt handler will not find the active PMC values that had caused the overflow and thus leads to this message while replaying. Case 1: PMU Interrupt happens during replay of other interrupts and counter values gets cleared by PMU callbacks before replay: During replay of interrupts like timer, __do_irq() and doorbell exception, we conditionally enable interrupts via may_hard_irq_enable(). This could potentially create a window to generate a PMI. Since irq soft mask is set to ALL_DISABLED, the PMI will get masked here. We could get IPIs run before perf interrupt is replayed and the PMU events could be deleted or stopped. This will change the PMU SPR values and resets the counters. Snippet of ftrace log showing PMU callbacks invoked in __do_irq(): <idle>-0 [051] dns. 132025441306354: __do_irq <-call_do_irq <idle>-0 [051] dns. 132025441306430: irq_enter <-__do_irq <idle>-0 [051] dns. 132025441306503: irq_enter_rcu <-__do_irq <idle>-0 [051] dnH. 132025441306599: xive_get_irq <-__do_irq <<>> <idle>-0 [051] dnH. 132025441307770: generic_smp_call_function_single_interrupt <-smp_ipi_demux_relaxed <idle>-0 [051] dnH. 132025441307839: flush_smp_call_function_queue <-smp_ipi_demux_relaxed <idle>-0 [051] dnH. 132025441308057: _raw_spin_lock <-event_function <idle>-0 [051] dnH. 132025441308206: power_pmu_disable <-perf_pmu_disable <idle>-0 [051] dnH. 132025441308337: power_pmu_del <-event_sched_out <idle>-0 [051] dnH. 132025441308407: power_pmu_read <-power_pmu_del <idle>-0 [051] dnH. 132025441308477: read_pmc <-power_pmu_read <idle>-0 [051] dnH. 132025441308590: isa207_disable_pmc <-power_pmu_del <idle>-0 [051] dnH. 132025441308663: write_pmc <-power_pmu_del <idle>-0 [051] dnH. 132025441308787: power_pmu_event_idx <-perf_event_update_userpage <idle>-0 [051] dnH. 132025441308859: rcu_read_unlock_strict <-perf_event_update_userpage <idle>-0 [051] dnH. 132025441308975: power_pmu_enable <-perf_pmu_enable <<>> <idle>-0 [051] dnH. 132025441311108: irq_exit <-__do_irq <idle>-0 [051] dns. 132025441311319: performance_monitor_exception <-replay_soft_interrupts Case 2: PMI's masked during local_* operations, example local_add(). If the local_add() operation happens within a local_irq_save(), replay of PMI will be during local_irq_restore(). Similar to case 1, this could also create a window before replay where PMU events gets deleted or stopped. Fix it by updating the PMU callback function power_pmu_disable() to check for pending perf interrupt. If there is an overflown PMC and pending perf interrupt indicated in paca, clear the PMI bit in paca to drop that sample. Clearing of PMI bit is done in power_pmu_disable() since disable is invoked before any event gets deleted/stopped. With this fix, if there are more than one event running in the PMU, there is a chance that we clear the PMI bit for the event which is not getting deleted/stopped. The other events may still remain active. Hence to make sure we don't drop valid sample in such cases, another check is added in power_pmu_enable. This checks if there is an overflown PMC found among the active events and if so enable back the PMI bit. Two new helper functions are introduced to clear/set the PMI, ie clear_pmi_irq_pending() and set_pmi_irq_pending(). Helper function pmi_irq_pending() is introduced to give a warning if there is pending PMI bit in paca, but no PMC is overflown. Also there are corner cases which result in performance monitor interrupts being triggered during power_pmu_disable(). This happens since PMXE bit is not cleared along with disabling of other MMCR0 bits in the pmu_disable. Such PMI's could leave the PMU running and could trigger PMI again which will set MMCR0 PMAO bit. This could lead to spurious interrupts in some corner cases. Example, a timer after power_pmu_del() which will re-enable interrupts and triggers a PMI again since PMAO bit is still set. But fails to find valid overflow since PMC was cleared in power_pmu_del(). Fix that by disabling PMXE along with disabling of other MMCR0 bits in power_pmu_disable(). We can't just replay PMI any time. Hence this approach is preferred rather than replaying PMI before resetting overflown PMC. Patch also documents core-book3s on a race condition which can trigger these PMC messages during idle path in PowerNV. Fixes: f442d004806e ("powerpc/64s: Add support to mask perf interrupts and replay them") Reported-by: Nageswara R Sastry <nasastry@in.ibm.com> Suggested-by: Nicholas Piggin <npiggin@gmail.com> Suggested-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Make pmi_irq_pending() return bool, reflow/reword some comments] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1626846509-1350-2-git-send-email-atrajeev@linux.vnet.ibm.com
2021-11-24powerpc/64s: Implement PMU override command line optionNicholas Piggin1-0/+35
It can be useful in simulators (with very constrained environments) to allow some PMCs to run from boot so they can be sampled directly by a test harness, rather than having to run perf. A previous change freezes counters at boot by default, so provide a boot time option to un-freeze (plus a bit more flexibility). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211123095231.1036501-13-npiggin@gmail.com
2021-08-25powerpc/perf: Fix the check for SIAR valueKajol Jain1-7/+1
Incase of random sampling, there can be scenarios where Sample Instruction Address Register(SIAR) may not latch to the sampled instruction and could result in the value of 0. In these scenarios it is preferred to return regs->nip. These corner cases are seen in the previous generation (p9) also. Patch adds the check for SIAR value along with regs_use_siar and siar_valid checks so that the function will return regs->nip incase SIAR is zero. Patch drops the code under PPMU_P10_DD1 flag check which handles SIAR 0 case only for Power10 DD1. Fixes: 2ca13a4cc56c9 ("powerpc/perf: Use regs->nip when SIAR is zero") Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210818171556.36912-3-kjain@linux.ibm.com
2021-08-25powerpc/perf: Drop the case of returning 0 as instruction pointerKajol Jain1-2/+0
Drop the case of returning 0 as instruction pointer since kernel never executes at 0 and userspace almost never does either. Fixes: e6878835ac47 ("powerpc/perf: Sample only if SIAR-Valid bit is set in P7+") Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210818171556.36912-2-kjain@linux.ibm.com
2021-08-25powerpc/perf: Use stack siar instead of mfsprKajol Jain1-1/+1
Minor optimization in the 'perf_instruction_pointer' function code by making use of stack siar instead of mfspr. Fixes: 75382aa72f06 ("powerpc/perf: Move code to select SIAR or pt_regs into perf_read_regs") Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210818171556.36912-1-kjain@linux.ibm.com
2021-08-04powerpc/64s/perf: Always use SIAR for kernel interruptsNicholas Piggin1-0/+9
If an interrupt is taken in kernel mode, always use SIAR for it rather than looking at regs_sipr. This prevents samples piling up around interrupt enable (hard enable or interrupt replay via soft enable) in PMUs / modes where the PR sample indication is not in synch with SIAR. This results in better sampling of interrupt entry and exit in particular. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210720141504.420110-1-npiggin@gmail.com
2021-07-02Merge tag 'powerpc-5.14-1' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - A big series refactoring parts of our KVM code, and converting some to C. - Support for ARCH_HAS_SET_MEMORY, and ARCH_HAS_STRICT_MODULE_RWX on some CPUs. - Support for the Microwatt soft-core. - Optimisations to our interrupt return path on 64-bit. - Support for userspace access to the NX GZIP accelerator on PowerVM on Power10. - Enable KUAP and KUEP by default on 32-bit Book3S CPUs. - Other smaller features, fixes & cleanups. Thanks to: Andy Shevchenko, Aneesh Kumar K.V, Arnd Bergmann, Athira Rajeev, Baokun Li, Benjamin Herrenschmidt, Bharata B Rao, Christophe Leroy, Daniel Axtens, Daniel Henrique Barboza, Finn Thain, Geoff Levand, Haren Myneni, Jason Wang, Jiapeng Chong, Joel Stanley, Jordan Niethe, Kajol Jain, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Nick Desaulniers, Paul Mackerras, Russell Currey, Sathvika Vasireddy, Shaokun Zhang, Stephen Rothwell, Sudeep Holla, Suraj Jitindar Singh, Tom Rix, Vaibhav Jain, YueHaibing, Zhang Jianhua, and Zhen Lei. * tag 'powerpc-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (218 commits) powerpc: Only build restart_table.c for 64s powerpc/64s: move ret_from_fork etc above __end_soft_masked powerpc/64s/interrupt: clean up interrupt return labels powerpc/64/interrupt: add missing kprobe annotations on interrupt exit symbols powerpc/64: enable MSR[EE] in irq replay pt_regs powerpc/64s/interrupt: preserve regs->softe for NMI interrupts powerpc/64s: add a table of implicit soft-masked addresses powerpc/64e: remove implicit soft-masking and interrupt exit restart logic powerpc/64e: fix CONFIG_RELOCATABLE build warnings powerpc/64s: fix hash page fault interrupt handler powerpc/4xx: Fix setup_kuep() on SMP powerpc/32s: Fix setup_{kuap/kuep}() on SMP powerpc/interrupt: Use names in check_return_regs_valid() powerpc/interrupt: Also use exit_must_hard_disable() on PPC32 powerpc/sysfs: Replace sizeof(arr)/sizeof(arr[0]) with ARRAY_SIZE powerpc/ptrace: Refactor regs_set_return_{msr/ip} powerpc/ptrace: Move set_return_regs_changed() before regs_set_return_{msr/ip} powerpc/stacktrace: Fix spurious "stale" traces in raise_backtrace_ipi() powerpc/pseries/vas: Include irqdomain.h powerpc: mark local variables around longjmp as volatile ...
2021-06-18powerpc/perf: Fix crash in perf_instruction_pointer() when ppmu is not setAthira Rajeev1-1/+1
On systems without any specific PMU driver support registered, running perf record causes Oops. The relevant portion from call trace: BUG: Kernel NULL pointer dereference on read at 0x00000040 Faulting instruction address: 0xc0021f0c Oops: Kernel access of bad area, sig: 11 [#1] BE PAGE_SIZE=4K PREEMPT CMPCPRO SAF3000 DIE NOTIFICATION CPU: 0 PID: 442 Comm: null_syscall Not tainted 5.13.0-rc6-s3k-dev-01645-g7649ee3d2957 #5164 NIP: c0021f0c LR: c00e8ad8 CTR: c00d8a5c NIP perf_instruction_pointer+0x10/0x60 LR perf_prepare_sample+0x344/0x674 Call Trace: perf_prepare_sample+0x7c/0x674 (unreliable) perf_event_output_forward+0x3c/0x94 __perf_event_overflow+0x74/0x14c perf_swevent_hrtimer+0xf8/0x170 __hrtimer_run_queues.constprop.0+0x160/0x318 hrtimer_interrupt+0x148/0x3b0 timer_interrupt+0xc4/0x22c Decrementer_virt+0xb8/0xbc During perf record session, perf_instruction_pointer() is called to capture the sample IP. This function in core-book3s accesses ppmu->flags. If a platform specific PMU driver is not registered, ppmu is set to NULL and accessing its members results in a crash. Fix this crash by checking if ppmu is set. Fixes: 2ca13a4cc56c ("powerpc/perf: Use regs->nip when SIAR is zero") Cc: stable@vger.kernel.org # v5.11+ Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1623952506-1431-1-git-send-email-atrajeev@linux.vnet.ibm.com
2021-06-17powerpc: Don't use 'struct ppc_inst' to reference instruction locationChristophe Leroy1-2/+2
'struct ppc_inst' is an internal representation of an instruction, but in-memory instructions are and will remain a table of 'u32' forever. Replace all 'struct ppc_inst *' used for locating an instruction in memory by 'u32 *'. This removes a lot of undue casts to 'struct ppc_inst *'. It also helps locating ab-use of 'struct ppc_inst' dereference. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Fix ppc_inst_next(), use u32 instead of unsigned int] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7062722b087228e42cbd896e39bfdf526d6a340a.1621516826.git.christophe.leroy@csgroup.eu
2021-04-20powerpc/perf: Expose processor pipeline stage cycles using ↵Athira Rajeev1-2/+2
PERF_SAMPLE_WEIGHT_STRUCT Performance Monitoring Unit (PMU) registers in powerpc provides information on cycles elapsed between different stages in the pipeline. This can be used for application tuning. On ISA v3.1 platform, this information is exposed by sampling registers. Patch adds kernel support to capture two of the cycle counters as part of perf sample using the sample type: PERF_SAMPLE_WEIGHT_STRUCT. The power PMU function 'get_mem_weight' currently uses 64 bit weight field of perf_sample_data to capture memory latency. But following the introduction of PERF_SAMPLE_WEIGHT_TYPE, weight field could contain 64-bit or 32-bit value depending on the architexture support for PERF_SAMPLE_WEIGHT_STRUCT. Patches uses WEIGHT_STRUCT to expose the pipeline stage cycles info. Hence update the ppmu functions to work for 64-bit and 32-bit weight values. If the sample type is PERF_SAMPLE_WEIGHT, use the 64-bit weight field. if the sample type is PERF_SAMPLE_WEIGHT_STRUCT, memory subsystem latency is stored in the low 32bits of perf_sample_weight structure. Also for CPU_FTR_ARCH_31, capture the two cycle counter information in two 16 bit fields of perf_sample_weight structure. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1616425047-1666-2-git-send-email-atrajeev@linux.vnet.ibm.com
2021-04-17powerpc/traps: Enhance readability for trap typesXiongwei Song1-2/+3
Define macros to list ppc interrupt types in interttupt.h, replace the reference of the trap hex values with these macros. Referred the hex numbers in arch/powerpc/kernel/exceptions-64e.S, arch/powerpc/kernel/exceptions-64s.S, arch/powerpc/kernel/head_*.S, arch/powerpc/kernel/head_booke.h and arch/powerpc/include/asm/kvm_asm.h. Signed-off-by: Xiongwei Song <sxwjean@gmail.com> [mpe: Resolve conflicts in nmi_disables_ftrace(), fix 40x build] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1618398033-13025-1-git-send-email-sxwjean@me.com
2021-04-14powerpc/perf: Infrastructure to support checking of attr.config*Madhavan Srinivasan1-0/+11
Introduce code to support the checking of attr.config* for values which are reserved for a given platform. Performance Monitoring Unit (PMU) configuration registers have fields that are reserved and some specific values for bit fields are reserved. For ex., MMCRA[61:62] is Random Sampling Mode (SM) and value of 0b11 for this field is reserved. Writing non-zero or invalid values in these fields will have unknown behaviours. Patch adds a generic call-back function "check_attr_config" in "struct power_pmu", to be called in event_init to check for attr.config* values for a given platform. Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210408074504.248211-1-maddy@linux.ibm.com
2021-03-02powerpc/perf: Fix handling of privilege level checks in perf interrupt contextAthira Rajeev1-2/+2
Running "perf mem record" in powerpc platforms with selinux enabled resulted in soft lockup's. Below call-trace was seen in the logs: CPU: 58 PID: 3751 Comm: sssd_nss Not tainted 5.11.0-rc7+ #2 NIP: c000000000dff3d4 LR: c000000000dff3d0 CTR: 0000000000000000 REGS: c000007fffab7d60 TRAP: 0100 Not tainted (5.11.0-rc7+) ... NIP _raw_spin_lock_irqsave+0x94/0x120 LR _raw_spin_lock_irqsave+0x90/0x120 Call Trace: 0xc00000000fd47260 (unreliable) skb_queue_tail+0x3c/0x90 audit_log_end+0x6c/0x180 common_lsm_audit+0xb0/0xe0 slow_avc_audit+0xa4/0x110 avc_has_perm+0x1c4/0x260 selinux_perf_event_open+0x74/0xd0 security_perf_event_open+0x68/0xc0 record_and_restart+0x6e8/0x7f0 perf_event_interrupt+0x22c/0x560 performance_monitor_exception0x4c/0x60 performance_monitor_common_virt+0x1c8/0x1d0 interrupt: f00 at _raw_spin_lock_irqsave+0x38/0x120 NIP: c000000000dff378 LR: c000000000b5fbbc CTR: c0000000007d47f0 REGS: c00000000fd47860 TRAP: 0f00 Not tainted (5.11.0-rc7+) ... NIP _raw_spin_lock_irqsave+0x38/0x120 LR skb_queue_tail+0x3c/0x90 interrupt: f00 0x38 (unreliable) 0xc00000000aae6200 audit_log_end+0x6c/0x180 audit_log_exit+0x344/0xf80 __audit_syscall_exit+0x2c0/0x320 do_syscall_trace_leave+0x148/0x200 syscall_exit_prepare+0x324/0x390 system_call_common+0xfc/0x27c The above trace shows that while the CPU was handling a performance monitor exception, there was a call to security_perf_event_open() function. In powerpc core-book3s, this function is called from perf_allow_kernel() check during recording of data address in the sample via perf_get_data_addr(). Commit da97e18458fb ("perf_event: Add support for LSM and SELinux checks") introduced security enhancements to perf. As part of this commit, the new security hook for perf_event_open() was added in all places where perf paranoid check was previously used. In powerpc core-book3s code, originally had paranoid checks in perf_get_data_addr() and power_pmu_bhrb_read(). So perf_paranoid_kernel() checks were replaced with perf_allow_kernel() in these PMU helper functions as well. The intention of paranoid checks in core-book3s was to verify privilege access before capturing some of the sample data. Along with paranoid checks, perf_allow_kernel() also does a security_perf_event_open(). Since these functions are accessed while recording a sample, we end up calling selinux_perf_event_open() in PMI context. Some of the security functions use spinlock like sidtab_sid2str_put(). If a perf interrupt hits under a spin lock and if we end up in calling selinux hook functions in PMI handler, this could cause a dead lock. Since the purpose of this security hook is to control access to perf_event_open(), it is not right to call this in interrupt context. The paranoid checks in powerpc core-book3s were done at interrupt time which is also not correct. Reference commits: Commit cd1231d7035f ("powerpc/perf: Prevent kernel address leak via perf_get_data_addr()") Commit bb19af816025 ("powerpc/perf: Prevent kernel address leak to userspace via BHRB buffer") We only allow creation of events that have already passed the privilege checks in perf_event_open(). So these paranoid checks are not needed at event time. As a fix, patch uses 'event->attr.exclude_kernel' check to prevent exposing kernel address for userspace only sampling. Fixes: cd1231d7035f ("powerpc/perf: Prevent kernel address leak via perf_get_data_addr()") Cc: stable@vger.kernel.org # v4.17+ Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1614247839-1428-1-git-send-email-atrajeev@linux.vnet.ibm.com
2021-02-22Merge tag 'powerpc-5.12-1' of ↵Linus Torvalds1-49/+47
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - A large series adding wrappers for our interrupt handlers, so that irq/nmi/user tracking can be isolated in the wrappers rather than spread in each handler. - Conversion of the 32-bit syscall handling into C. - A series from Nick to streamline our TLB flushing when using the Radix MMU. - Switch to using queued spinlocks by default for 64-bit server CPUs. - A rework of our PCI probing so that it happens later in boot, when more generic infrastructure is available. - Two small fixes to allow 32-bit little-endian processes to run on 64-bit kernels. - Other smaller features, fixes & cleanups. Thanks to: Alexey Kardashevskiy, Ananth N Mavinakayanahalli, Aneesh Kumar K.V, Athira Rajeev, Bhaskar Chowdhury, Cédric Le Goater, Chengyang Fan, Christophe Leroy, Christopher M. Riedl, Fabiano Rosas, Florian Fainelli, Frederic Barrat, Ganesh Goudar, Hari Bathini, Jiapeng Chong, Joseph J Allen, Kajol Jain, Markus Elfring, Michal Suchanek, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Pingfan Liu, Po-Hsu Lin, Qian Cai, Ram Pai, Randy Dunlap, Sandipan Das, Stephen Rothwell, Tyrel Datwyler, Will Springer, Yury Norov, and Zheng Yongjun. * tag 'powerpc-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (188 commits) powerpc/perf: Adds support for programming of Thresholding in P10 powerpc/pci: Remove unimplemented prototypes powerpc/uaccess: Merge raw_copy_to_user_allowed() into raw_copy_to_user() powerpc/uaccess: Merge __put_user_size_allowed() into __put_user_size() powerpc/uaccess: get rid of small constant size cases in raw_copy_{to,from}_user() powerpc/64: Fix stack trace not displaying final frame powerpc/time: Remove get_tbl() powerpc/time: Avoid using get_tbl() spi: mpc52xx: Avoid using get_tbl() powerpc/syscall: Avoid storing 'current' in another pointer powerpc/32: Handle bookE debugging in C in syscall entry/exit powerpc/syscall: Do not check unsupported scv vector on PPC32 powerpc/32: Remove the counter in global_dbcr0 powerpc/32: Remove verification of MSR_PR on syscall in the ASM entry powerpc/syscall: implement system call entry/exit logic in C for PPC32 powerpc/32: Always save non volatile GPRs at syscall entry powerpc/syscall: Change condition to check MSR_RI powerpc/syscall: Save r3 in regs->orig_r3 powerpc/syscall: Use is_compat_task() powerpc/syscall: Make interrupt.c buildable on PPC32 ...
2021-02-11powerpc/perf: Adds support for programming of Thresholding in P10Kajol Jain1-7/+8
Thresholding, a performance monitoring unit feature, can be used to identify marked instructions which take more than expected cycles between start event and end event. Threshold compare (thresh_cmp) bits are programmed in MMCRA register. In Power9, thresh_cmp bits were part of the event code. But in case of P10, thresh_cmp are not part of event code due to inclusion of MMCR3 bits. Patch here adds an option to use attr.config1 variable to be used to pass thresh_cmp value to be programmed in MMCRA register. A new ppmu flag called PPMU_HAS_ATTR_CONFIG1 has been added and this flag is used to notify the use of attr.config1 variable. Patch has extended the parameter list of 'compute_mmcr', to include power_pmu's 'flags' element and parameter list of get_constraint to include attr.config1 value. It also extend parameter list of power_check_constraints inorder to pass perf_event list. As stated by commit ef0e3b650f8d ("powerpc/perf: Fix Threshold Event Counter Multiplier width for P10"), constraint bits for thresh_cmp is also needed to be increased to 11 bits, which is handled as part of this patch. We added bit number 53 as part of constraint bits of thresh_cmp for power10 to make it an 11 bit field. Updated layout for p10: /* * Layout of constraint bits: * * 60 56 52 48 44 40 36 32 * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | * [ fab_match ] [ thresh_cmp ] [ thresh_ctl ] [ ] * | | * [ thresh_cmp bits for p10] thresh_sel -* * * 28 24 20 16 12 8 4 0 * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | * [ ] | [ ] | [ sample ] [ ] [6] [5] [4] [3] [2] [1] * | | | | | * BHRB IFM -* | | |*radix_scope | Count of events for each PMC. * EBB -* | | p1, p2, p3, p4, p5, p6. * L1 I/D qualifier -* | * nc - number of counters -* * * The PMC fields P1..P6, and NC, are adder fields. As we accumulate constraints * we want the low bit of each field to be added to any existing value. * * Everything else is a value field. */ Result: command#: cat /sys/devices/cpu/format/thresh_cmp config1:0-17 ex. usage: command#: perf record -I --weight -d -e cpu/event=0x67340101EC,thresh_cmp=500/ ./ebizzy -S 2 -t 1 -s 4096 1826636 records/s real 2.00 s user 2.00 s sys 0.00 s [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.038 MB perf.data (61 samples) ] Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210209095234.837356-1-kjain@linux.ibm.com
2021-02-09powerpc/perf: Record counter overflow always if SAMPLE_IP is unsetAthira Rajeev1-4/+15
While sampling for marked events, currently we record the sample only if the SIAR valid bit of Sampled Instruction Event Register (SIER) is set. SIAR_VALID bit is used for fetching the instruction address from Sampled Instruction Address Register(SIAR). But there are some usecases, where the user is interested only in the PMU stats at each counter overflow and the exact IP of the overflow event is not required. Dropping SIAR invalid samples will fail to record some of the counter overflows in such cases. Example of such usecase is dumping the PMU stats (event counts) after some regular amount of instructions/events from the userspace (ex: via ptrace). Here counter overflow is indicated to userspace via signal handler, and captured by monitoring and enabling I/O signaling on the event file descriptor. In these cases, we expect to get sample/overflow indication after each specified sample_period. Perf event attribute will not have PERF_SAMPLE_IP set in the sample_type if exact IP of the overflow event is not requested. So while profiling if SAMPLE_IP is not set, just record the counter overflow irrespective of SIAR_VALID check. Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> [mpe: Reflow comment and if formatting] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1612516492-1428-1-git-send-email-atrajeev@linux.vnet.ibm.com
2021-02-09powerpc/perf: Expose Performance Monitor Counter SPR's as part of extended regsAthira Rajeev1-0/+11
Currently Monitor Mode Control Registers and Sampling registers are part of extended regs. Patch adds support to include Performance Monitor Counter Registers (PMC1 to PMC6 ) as part of extended registers. PMCs are saved in the perf interrupt handler as part of per-cpu array 'pmcs' in struct cpu_hw_events. While capturing the register values for extended regs, fetch these saved PMC values. Simplified the PERF_REG_PMU_MASK_300/31 definition to include PMU SPRs MMCR0 to PMC6. Exclude the unsupported SPRs (MMCR3, SIER2, SIER3) from extended mask value for CPU_FTR_ARCH_300 in the new definition. PERF_REG_EXTENDED_MAX is used to check if any index beyond the extended registers is requested in the sample. Have one PERF_REG_EXTENDED_MAX for CPU_FTR_ARCH_300/CPU_FTR_ARCH_31 since perf_reg_validate function already checks the extended mask for the presence of any unsupported register. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1612335337-1888-3-git-send-email-atrajeev@linux.vnet.ibm.com
2021-02-09powerpc/perf: Include PMCs as part of per-cpu cpuhw_events structAthira Rajeev1-6/+12
To support capturing of PMC's as part of extended registers, the value of SPR's PMC1 to PMC6 has to be saved in the starting of PMI interrupt handler. This is needed since we are resetting the overflown PMC before creating sample and hence directly reading SPRN_PMCx in 'perf_reg_value' will be capturing the modified value. To solve this, add a per-cpu array as part of structure cpu_hw_events and use this array to capture PMC values in the perf interrupt handler. Patch also re-factor's the interrupt handler code to use this per-cpu array instead of current local array. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1612335337-1888-2-git-send-email-atrajeev@linux.vnet.ibm.com
2021-02-09powerpc/perf: move perf irq/nmi handling details into traps.cNicholas Piggin1-33/+2
This is required in order to allow more significant differences between NMI type interrupt handlers and regular asynchronous handlers. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-20-npiggin@gmail.com
2021-02-01perf/core: Add PERF_SAMPLE_WEIGHT_STRUCTKan Liang1-1/+1
Current PERF_SAMPLE_WEIGHT sample type is very useful to expresses the cost of an action represented by the sample. This allows the profiler to scale the samples to be more informative to the programmer. It could also help to locate a hotspot, e.g., when profiling by memory latencies, the expensive load appear higher up in the histograms. But current PERF_SAMPLE_WEIGHT sample type is solely determined by one factor. This could be a problem, if users want two or more factors to contribute to the weight. For example, Golden Cove core PMU can provide both the instruction latency and the cache Latency information as factors for the memory profiling. For current X86 platforms, although meminfo::latency is defined as a u64, only the lower 32 bits include the valid data in practice (No memory access could last than 4G cycles). The higher 32 bits can be used to store new factors. Add a new sample type, PERF_SAMPLE_WEIGHT_STRUCT, to indicate the new sample weight structure. It shares the same space as the PERF_SAMPLE_WEIGHT sample type. Users can apply either the PERF_SAMPLE_WEIGHT sample type or the PERF_SAMPLE_WEIGHT_STRUCT sample type to retrieve the sample weight, but they cannot apply both sample types simultaneously. Currently, only X86 and PowerPC use the PERF_SAMPLE_WEIGHT sample type. - For PowerPC, there is nothing changed for the PERF_SAMPLE_WEIGHT sample type. There is no effect for the new PERF_SAMPLE_WEIGHT_STRUCT sample type. PowerPC can re-struct the weight field similarly later. - For X86, the same value will be dumped for the PERF_SAMPLE_WEIGHT sample type or the PERF_SAMPLE_WEIGHT_STRUCT sample type for now. The following patches will apply the new factors for the PERF_SAMPLE_WEIGHT_STRUCT sample type. The field in the union perf_sample_weight should be shared among different architectures. A generic name is required, but it's hard to abstract a name that applies to all architectures. For example, on X86, the fields are to store all kinds of latency. While on PowerPC, it stores MMCRA[TECX/TECM], which should not be latency. So a general name prefix 'var$NUM' is used here. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1611873611-156687-2-git-send-email-kan.liang@linux.intel.com
2020-12-17Merge tag 'powerpc-5.11-1' of ↵Linus Torvalds1-6/+59
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Switch to the generic C VDSO, as well as some cleanups of our VDSO setup/handling code. - Support for KUAP (Kernel User Access Prevention) on systems using the hashed page table MMU, using memory protection keys. - Better handling of PowerVM SMT8 systems where all threads of a core do not share an L2, allowing the scheduler to make better scheduling decisions. - Further improvements to our machine check handling. - Show registers when unwinding interrupt frames during stack traces. - Improvements to our pseries (PowerVM) partition migration code. - Several series from Christophe refactoring and cleaning up various parts of the 32-bit code. - Other smaller features, fixes & cleanups. Thanks to: Alan Modra, Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Ard Biesheuvel, Athira Rajeev, Balamuruhan S, Bill Wendling, Cédric Le Goater, Christophe Leroy, Christophe Lombard, Colin Ian King, Daniel Axtens, David Hildenbrand, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geert Uytterhoeven, Giuseppe Sacco, Greg Kurz, Harish, Jan Kratochvil, Jordan Niethe, Kaixu Xia, Laurent Dufour, Leonardo Bras, Madhavan Srinivasan, Mahesh Salgaonkar, Mathieu Desnoyers, Nathan Lynch, Nicholas Piggin, Oleg Nesterov, Oliver O'Halloran, Oscar Salvador, Po-Hsu Lin, Qian Cai, Qinglang Miao, Randy Dunlap, Ravi Bangoria, Sachin Sant, Sandipan Das, Sebastian Andrzej Siewior , Segher Boessenkool, Srikar Dronamraju, Tyrel Datwyler, Uwe Kleine-König, Vincent Stehlé, Youling Tang, and Zhang Xiaoxu. * tag 'powerpc-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (304 commits) powerpc/32s: Fix cleanup_cpu_mmu_context() compile bug powerpc: Add config fragment for disabling -Werror powerpc/configs: Add ppc64le_allnoconfig target powerpc/powernv: Rate limit opal-elog read failure message powerpc/pseries/memhotplug: Quieten some DLPAR operations powerpc/ps3: use dma_mapping_error() powerpc: force inlining of csum_partial() to avoid multiple csum_partial() with GCC10 powerpc/perf: Fix Threshold Event Counter Multiplier width for P10 powerpc/mm: Fix hugetlb_free_pmd_range() and hugetlb_free_pud_range() KVM: PPC: Book3S HV: Fix mask size for emulated msgsndp KVM: PPC: fix comparison to bool warning KVM: PPC: Book3S: Assign boolean values to a bool variable powerpc: Inline setup_kup() powerpc/64s: Mark the kuap/kuep functions non __init KVM: PPC: Book3S HV: XIVE: Add a comment regarding VP numbering powerpc/xive: Improve error reporting of OPAL calls powerpc/xive: Simplify xive_do_source_eoi() powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_EOI_FW powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_MASK_FW powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_SHIFT_BUG ...
2020-12-10powerpc/perf: Exclude kernel samples while counting events in user space.Athira Rajeev1-0/+10
Perf event attritube supports exclude_kernel flag to avoid sampling/profiling in supervisor state (kernel). Based on this event attr flag, Monitor Mode Control Register bit is set to freeze on supervisor state. But sometimes (due to hardware limitation), Sampled Instruction Address Register (SIAR) locks on to kernel address even when freeze on supervisor is set. Patch here adds a check to drop those samples. Cc: stable@vger.kernel.org Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1606289215-1433-1-git-send-email-atrajeev@linux.vnet.ibm.com
2020-12-04powerpc/perf: MMCR0 control for PMU registers under PMCC=00Athira Rajeev1-0/+4
PowerISA v3.1 introduces new control bit (PMCCEXT) for restricting access to group B PMU registers in problem state when MMCR0 PMCC=0b00. In problem state and when MMCR0 PMCC=0b00, setting the Monitor Mode Control Register bit 54 (MMCR0 PMCCEXT), will restrict read permission on Group B Performance Monitor Registers (SIER, SIAR, SDAR and MMCR1). When this bit is set to zero, group B registers will be readable. In other platforms (like power9), the older behaviour is retained where group B PMU SPRs are readable. Patch adds support for MMCR0 PMCCEXT bit in power10 by enabling this bit during boot and during the PMU event enable/disable callback functions. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1606409684-1589-8-git-send-email-atrajeev@linux.vnet.ibm.com
2020-12-04powerpc/perf: Invoke per-CPU variable access with disabled interruptsAthira Rajeev1-4/+6
The power_pmu_event_init() callback access per-cpu variable (cpu_hw_events) to check for event constraints and Branch Stack (BHRB). Current usage is to disable preemption when accessing the per-cpu variable, but this does not prevent timer callback from interrupting event_init. Fix this by using 'local_irq_save/restore' to make sure the code path is invoked with disabled interrupts. This change is tested in mambo simulator to ensure that, if a timer interrupt comes in during the per-cpu access in event_init, it will be soft masked and replayed later. For testing purpose, introduced a udelay() in power_pmu_event_init() to make sure a timer interrupt arrives while in per-cpu variable access code between local_irq_save/resore. As expected the timer interrupt was replayed later during local_irq_restore called from power_pmu_event_init. This was confirmed by adding breakpoint in mambo and checking the backtrace when timer_interrupt was hit. Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1606814880-1720-1-git-send-email-atrajeev@linux.vnet.ibm.com
2020-12-04powerpc/perf: Fix crash with is_sier_available when pmu is not setAthira Rajeev1-0/+3
On systems without any specific PMU driver support registered, running 'perf record' with —intr-regs will crash ( perf record -I <workload> ). The relevant portion from crash logs and Call Trace: Unable to handle kernel paging request for data at address 0x00000068 Faulting instruction address: 0xc00000000013eb18 Oops: Kernel access of bad area, sig: 11 [#1] CPU: 2 PID: 13435 Comm: kill Kdump: loaded Not tainted 4.18.0-193.el8.ppc64le #1 NIP: c00000000013eb18 LR: c000000000139f2c CTR: c000000000393d80 REGS: c0000004a07ab4f0 TRAP: 0300 Not tainted (4.18.0-193.el8.ppc64le) NIP [c00000000013eb18] is_sier_available+0x18/0x30 LR [c000000000139f2c] perf_reg_value+0x6c/0xb0 Call Trace: [c0000004a07ab770] [c0000004a07ab7c8] 0xc0000004a07ab7c8 (unreliable) [c0000004a07ab7a0] [c0000000003aa77c] perf_output_sample+0x60c/0xac0 [c0000004a07ab840] [c0000000003ab3f0] perf_event_output_forward+0x70/0xb0 [c0000004a07ab8c0] [c00000000039e208] __perf_event_overflow+0x88/0x1a0 [c0000004a07ab910] [c00000000039e42c] perf_swevent_hrtimer+0x10c/0x1d0 [c0000004a07abc50] [c000000000228b9c] __hrtimer_run_queues+0x17c/0x480 [c0000004a07abcf0] [c00000000022aaf4] hrtimer_interrupt+0x144/0x520 [c0000004a07abdd0] [c00000000002a864] timer_interrupt+0x104/0x2f0 [c0000004a07abe30] [c0000000000091c4] decrementer_common+0x114/0x120 When perf record session is started with "-I" option, capturing registers on each sample calls is_sier_available() to check for the SIER (Sample Instruction Event Register) availability in the platform. This function in core-book3s accesses 'ppmu->flags'. If a platform specific PMU driver is not registered, ppmu is set to NULL and accessing its members results in a crash. Fix the crash by returning false in is_sier_available() if ppmu is not set. Fixes: 333804dc3b7a ("powerpc/perf: Update perf_regs structure to include SIER") Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1606185640-1720-1-git-send-email-atrajeev@linux.vnet.ibm.com
2020-11-26Merge remote-tracking branch 'origin/master' into perf/corePeter Zijlstra1-5/+19
Further perf/core patches will depend on: d3f7b1bb2040 ("mm/gup: fix gup_fast with dynamic page table folding") which is already in Linus' tree.
2020-11-19powerpc/perf: Use regs->nip when SIAR is zeroMadhavan Srinivasan1-4/+17
In power10 DD1, there is an issue where the SIAR (Sampled Instruction Address Register) is not latching to the sampled address during random sampling. This results in value of 0s in the SIAR. Add a check to use regs->nip when SIAR is zero. Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201021085329.384535-5-maddy@linux.ibm.com
2020-11-19powerpc/perf: Use the address from SIAR register to set cpumode flagsAthira Rajeev1-0/+14
While setting the processor mode for any sample, perf_get_misc_flags() expects the privilege level to differentiate the userspace and kernel address. On power10 DD1, there is an issue that causes MSR_HV MSR_PR bits of Sampled Instruction Event Register (SIER) not to be set for marked events. Hence add a check to use the address in SIAR (Sampled Instruction Address Register) to identify the privilege level. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201021085329.384535-3-maddy@linux.ibm.com
2020-11-19powerpc/perf: Drop the check for SIAR_VALIDAthira Rajeev1-1/+8
In power10 DD1, there is an issue that causes the SIAR_VALID bit of the SIER (Sampled Instruction Event Register) to not be set. But the SIAR_VALID bit is used for fetching the instruction address from the SIAR (Sampled Instruction Address Register), and marked events are sampled only if the SIAR_VALID bit is set. So drop the check for SIAR_VALID and return true always incase of power10 DD1. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201021085329.384535-2-maddy@linux.ibm.com
2020-10-29powerpc/perf: Support PERF_SAMPLE_DATA_PAGE_SIZEKan Liang1-2/+4
The new sample type, PERF_SAMPLE_DATA_PAGE_SIZE, requires the virtual address. Update the data->addr if the sample type is set. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201001135749.2804-4-kan.liang@linux.intel.com
2020-08-27powerpc/perf: Fix crashes with generic_compat_pmu & BHRBAlexey Kardashevskiy1-5/+14
The bhrb_filter_map ("The Branch History Rolling Buffer") callback is only defined in raw CPUs' power_pmu structs. The "architected" CPUs use generic_compat_pmu, which does not have this callback, and crashes occur if a user tries to enable branch stack for an event. This add a NULL pointer check for bhrb_filter_map() which behaves as if the callback returned an error. This does not add the same check for config_bhrb() as the only caller checks for cpuhw->bhrb_users which remains zero if bhrb_filter_map==0. Fixes: be80e758d0c2 ("powerpc/perf: Add generic compat mode pmu driver") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200602025612.62707-1-aik@ozlabs.ru
2020-08-20powerpc/perf: Fix soft lockups due to missed interrupt accountingAthira Rajeev1-0/+4
Performance monitor interrupt handler checks if any counter has overflown and calls record_and_restart() in core-book3s which invokes perf_event_overflow() to record the sample information. Apart from creating sample, perf_event_overflow() also does the interrupt and period checks via perf_event_account_interrupt(). Currently we record information only if the SIAR (Sampled Instruction Address Register) valid bit is set (using siar_valid() check) and hence the interrupt check. But it is possible that we do sampling for some events that are not generating valid SIAR, and hence there is no chance to disable the event if interrupts are more than max_samples_per_tick. This leads to soft lockup. Fix this by adding perf_event_account_interrupt() in the invalid SIAR code path for a sampling event. ie if SIAR is invalid, just do interrupt check and don't record the sample information. Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1596717992-7321-1-git-send-email-atrajeev@linux.vnet.ibm.com
2020-08-17powerpc/perf: Add support for outputting extended regs in perf intr_regsAnju T Sudhakar1-0/+1
Add support for perf extended register capability in powerpc. The capability flag PERF_PMU_CAP_EXTENDED_REGS, is used to indicate the PMU which support extended registers. The generic code define the mask of extended registers as 0 for non supported architectures. Patch adds extended regs support for power9 platform by exposing MMCR0, MMCR1 and MMCR2 registers. REG_RESERVED mask needs update to include extended regs. PERF_REG_EXTENDED_MASK, contains mask value of the supported registers, is defined at runtime in the kernel based on platform since the supported registers may differ from one processor version to another and hence the MASK value. With the patch: available registers: r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 r16 r17 r18 r19 r20 r21 r22 r23 r24 r25 r26 r27 r28 r29 r30 r31 nip msr orig_r3 ctr link xer ccr softe trap dar dsisr sier mmcra mmcr0 mmcr1 mmcr2 PERF_RECORD_SAMPLE(IP, 0x1): 4784/4784: 0 period: 1 addr: 0 ... intr regs: mask 0xffffffffffff ABI 64-bit .... r0 0xc00000000012b77c .... r1 0xc000003fe5e03930 .... r2 0xc000000001b0e000 .... r3 0xc000003fdcddf800 .... r4 0xc000003fc7880000 .... r5 0x9c422724be .... r6 0xc000003fe5e03908 .... r7 0xffffff63bddc8706 .... r8 0x9e4 .... r9 0x0 .... r10 0x1 .... r11 0x0 .... r12 0xc0000000001299c0 .... r13 0xc000003ffffc4800 .... r14 0x0 .... r15 0x7fffdd8b8b00 .... r16 0x0 .... r17 0x7fffdd8be6b8 .... r18 0x7e7076607730 .... r19 0x2f .... r20 0xc00000001fc26c68 .... r21 0xc0002041e4227e00 .... r22 0xc00000002018fb60 .... r23 0x1 .... r24 0xc000003ffec4d900 .... r25 0x80000000 .... r26 0x0 .... r27 0x1 .... r28 0x1 .... r29 0xc000000001be1260 .... r30 0x6008010 .... r31 0xc000003ffebb7218 .... nip 0xc00000000012b910 .... msr 0x9000000000009033 .... orig_r3 0xc00000000012b86c .... ctr 0xc0000000001299c0 .... link 0xc00000000012b77c .... xer 0x0 .... ccr 0x28002222 .... softe 0x1 .... trap 0xf00 .... dar 0x0 .... dsisr 0x80000000000 .... sier 0x0 .... mmcra 0x80000000000 .... mmcr0 0x82008090 .... mmcr1 0x1e000000 .... mmcr2 0x0 ... thread: perf:4784 Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Nageswara R Sastry <nasastry@in.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-and-tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1596794701-23530-2-git-send-email-atrajeev@linux.vnet.ibm.com
2020-08-07Merge tag 'powerpc-5.9-1' of ↵Linus Torvalds1-33/+75
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add support for (optionally) using queued spinlocks & rwlocks. - Support for a new faster system call ABI using the scv instruction on Power9 or later. - Drop support for the PROT_SAO mmap/mprotect flag as it will be unsupported on Power10 and future processors, leaving us with no way to implement the functionality it requests. This risks breaking userspace, though we believe it is unused in practice. - A bug fix for, and then the removal of, our custom stack expansion checking. We now allow stack expansion up to the rlimit, like other architectures. - Remove the remnants of our (previously disabled) topology update code, which tried to react to NUMA layout changes on virtualised systems, but was prone to crashes and other problems. - Add PMU support for Power10 CPUs. - A change to our signal trampoline so that we don't unbalance the link stack (branch return predictor) in the signal delivery path. - Lots of other cleanups, refactorings, smaller features and so on as usual. Thanks to: Abhishek Goel, Alastair D'Silva, Alexander A. Klimov, Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Balamuruhan S, Bharata B Rao, Bill Wendling, Bin Meng, Cédric Le Goater, Chris Packham, Christophe Leroy, Christoph Hellwig, Daniel Axtens, Dan Williams, David Lamparter, Desnes A. Nunes do Rosario, Erhard F., Finn Thain, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geoff Levand, Greg Kurz, Gustavo A. R. Silva, Hari Bathini, Harish, Imre Kaloz, Joel Stanley, Joe Perches, John Crispin, Jordan Niethe, Kajol Jain, Kamalesh Babulal, Kees Cook, Laurent Dufour, Leonardo Bras, Li RongQing, Madhavan Srinivasan, Mahesh Salgaonkar, Mark Cave-Ayland, Michal Suchanek, Milton Miller, Mimi Zohar, Murilo Opsfelder Araujo, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nayna Jain, Nicholas Piggin, Oliver O'Halloran, Palmer Dabbelt, Pedro Miraglia Franco de Carvalho, Philippe Bergheaud, Pingfan Liu, Pratik Rajesh Sampat, Qian Cai, Qinglang Miao, Randy Dunlap, Ravi Bangoria, Sachin Sant, Sam Bobroff, Sandipan Das, Santosh Sivaraj, Satheesh Rajendran, Shirisha Ganta, Sourabh Jain, Srikar Dronamraju, Stan Johnson, Stephen Rothwell, Thadeu Lima de Souza Cascardo, Thiago Jung Bauermann, Tom Lane, Vaibhav Jain, Vladis Dronov, Wei Yongjun, Wen Xiong, YueHaibing. * tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (337 commits) selftests/powerpc: Fix pkey syscall redefinitions powerpc: Fix circular dependency between percpu.h and mmu.h powerpc/powernv/sriov: Fix use of uninitialised variable selftests/powerpc: Skip vmx/vsx/tar/etc tests on older CPUs powerpc/40x: Fix assembler warning about r0 powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric powerpc/papr_scm: Fetch nvdimm performance stats from PHYP cpuidle: pseries: Fixup exit latency for CEDE(0) cpuidle: pseries: Add function to parse extended CEDE records cpuidle: pseries: Set the latency-hint before entering CEDE selftests/powerpc: Fix online CPU selection powerpc/perf: Consolidate perf_callchain_user_[64|32]() powerpc/pseries/hotplug-cpu: Remove double free in error path powerpc/pseries/mobility: Add pr_debug() for device tree changes powerpc/pseries/mobility: Set pr_fmt() powerpc/cacheinfo: Warn if cache object chain becomes unordered powerpc/cacheinfo: Improve diagnostics about malformed cache lists powerpc/cacheinfo: Use name@unit instead of full DT path in debug messages powerpc/cacheinfo: Set pr_fmt() powerpc: fix function annotations to avoid section mismatch warnings with gcc-10 ...
2020-07-27powerpc/64s/hash: Fix hash_preload running with interrupts enabledNicholas Piggin1-0/+6
Commit 2f92447f9f96 ("powerpc/book3s64/hash: Use the pte_t address from the caller") removed the local_irq_disable from hash_preload, but it was required for more than just the page table walk: the hash pte busy bit is effectively a lock which may be taken in interrupt context, and the local update flag test must not be preempted before it's used. This solves apparent lockups with perf interrupting __hash_page_64K. If get_perf_callchain then also takes a hash fault on the same page while it is already locked, it will loop forever taking hash faults, which looks like this: cpu 0x49e: Vector: 100 (System Reset) at [c00000001a4f7d70] pc: c000000000072dc8: hash_page_mm+0x8/0x800 lr: c00000000000c5a4: do_hash_page+0x24/0x38 sp: c0002ac1cc69ac70 msr: 8000000000081033 current = 0xc0002ac1cc602e00 paca = 0xc00000001de1f280 irqmask: 0x03 irq_happened: 0x01 pid = 20118, comm = pread2_processe Linux version 5.8.0-rc6-00345-g1fad14f18bc6 49e:mon> t [c0002ac1cc69ac70] c00000000000c5a4 do_hash_page+0x24/0x38 (unreliable) --- Exception: 300 (Data Access) at c00000000008fa60 __copy_tofrom_user_power7+0x20c/0x7ac [link register ] c000000000335d10 copy_from_user_nofault+0xf0/0x150 [c0002ac1cc69af70] c00032bf9fa3c880 (unreliable) [c0002ac1cc69afa0] c000000000109df0 read_user_stack_64+0x70/0xf0 [c0002ac1cc69afd0] c000000000109fcc perf_callchain_user_64+0x15c/0x410 [c0002ac1cc69b060] c000000000109c00 perf_callchain_user+0x20/0x40 [c0002ac1cc69b080] c00000000031c6cc get_perf_callchain+0x25c/0x360 [c0002ac1cc69b120] c000000000316b50 perf_callchain+0x70/0xa0 [c0002ac1cc69b140] c000000000316ddc perf_prepare_sample+0x25c/0x790 [c0002ac1cc69b1a0] c000000000317350 perf_event_output_forward+0x40/0xb0 [c0002ac1cc69b220] c000000000306138 __perf_event_overflow+0x88/0x1a0 [c0002ac1cc69b270] c00000000010cf70 record_and_restart+0x230/0x750 [c0002ac1cc69b620] c00000000010d69c perf_event_interrupt+0x20c/0x510 [c0002ac1cc69b730] c000000000027d9c performance_monitor_exception+0x4c/0x60 [c0002ac1cc69b750] c00000000000b2f8 performance_monitor_common_virt+0x1b8/0x1c0 --- Exception: f00 (Performance Monitor) at c0000000000cb5b0 pSeries_lpar_hpte_insert+0x0/0x160 [link register ] c0000000000846f0 __hash_page_64K+0x210/0x540 [c0002ac1cc69ba50] 0000000000000000 (unreliable) [c0002ac1cc69bb00] c000000000073ae0 update_mmu_cache+0x390/0x3a0 [c0002ac1cc69bb70] c00000000037f024 wp_page_copy+0x364/0xce0 [c0002ac1cc69bc20] c00000000038272c do_wp_page+0xdc/0xa60 [c0002ac1cc69bc70] c0000000003857bc handle_mm_fault+0xb9c/0x1b60 [c0002ac1cc69bd50] c00000000006c434 __do_page_fault+0x314/0xc90 [c0002ac1cc69be20] c00000000000c5c8 handle_page_fault+0x10/0x2c --- Exception: 300 (Data Access) at 00007fff8c861fe8 SP (7ffff6b19660) is in userspace Fixes: 2f92447f9f96 ("powerpc/book3s64/hash: Use the pte_t address from the caller") Reported-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reported-by: Anton Blanchard <anton@ozlabs.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200727060947.10060-1-npiggin@gmail.com
2020-07-22powerpc/perf: BHRB control to disable BHRB logic when not usedAthira Rajeev1-4/+16
PowerISA v3.1 has few updates for the Branch History Rolling Buffer(BHRB). BHRB disable is controlled via Monitor Mode Control Register A (MMCRA) bit, namely "BHRB Recording Disable (BHRBRD)". This field controls whether BHRB entries are written when BHRB recording is enabled by other bits. This patch implements support for this BHRB disable bit. By setting 0b1 to this bit will disable the BHRB and by setting 0b0 to this bit will have BHRB enabled. This addresses backward compatibility (for older OS), since this bit will be cleared and hardware will be writing to BHRB by default. This patch addresses changes to set MMCRA (BHRBRD) at boot for power10 (there by the core will run faster) and enable this feature only on runtime ie, on explicit need from user. Also save/restore MMCRA in the restore path of state-loss idle state to make sure we keep BHRB disabled if it was not enabled on request at runtime. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-12-git-send-email-atrajeev@linux.vnet.ibm.com
2020-07-22powerpc/perf: Ignore the BHRB kernel address filtering for P10Athira Rajeev1-1/+4
Commit bb19af816025 ("powerpc/perf: Prevent kernel address leak to userspace via BHRB buffer") added a check in bhrb_read() to filter the kernel address from BHRB buffer. This patch modified it to avoid that check for PowerISA v3.1 based processors, since PowerISA v3.1 allows only MSR[PR]=1 address to be written to BHRB buffer. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-10-git-send-email-atrajeev@linux.vnet.ibm.com
2020-07-22powerpc/perf: power10 Performance Monitoring supportAthira Rajeev1-0/+2
Base enablement patch to register performance monitoring hardware support for power10. Patch introduce the raw event encoding format, defines the supported list of events, config fields for the event attributes and their corresponding bit values which are exported via sysfs. Patch also enhances the support function in isa207_common.c to include power10 pmu hardware. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-9-git-send-email-atrajeev@linux.vnet.ibm.com