summaryrefslogtreecommitdiffstats
path: root/drivers/perf/arm_pmu.c
AgeCommit message (Collapse)AuthorFilesLines
2022-12-12Merge tag 'perf-core-2022-12-12' of ↵Linus Torvalds1-9/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events updates from Ingo Molnar: - Thoroughly rewrite the data structures that implement perf task context handling, with the goal of fixing various quirks and unfeatures both in already merged, and in upcoming proposed code. The old data structure is the per task and per cpu perf_event_contexts: task_struct::perf_events_ctxp[] <-> perf_event_context <-> perf_cpu_context ^ | ^ | ^ `---------------------------------' | `--> pmu ---' v ^ perf_event ------' In this new design this is replaced with a single task context and a single CPU context, plus intermediate data-structures: task_struct::perf_event_ctxp -> perf_event_context <- perf_cpu_context ^ | ^ ^ `---------------------------' | | | | perf_cpu_pmu_context <--. | `----. ^ | | | | | | v v | | ,--> perf_event_pmu_context | | | | | | | v v | perf_event ---> pmu ----------------' [ See commit bd2756811766 for more details. ] This rewrite was developed by Peter Zijlstra and Ravi Bangoria. - Optimize perf_tp_event() - Update the Intel uncore PMU driver, extending it with UPI topology discovery on various hardware models. - Misc fixes & cleanups * tag 'perf-core-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) perf/x86/intel/uncore: Fix reference count leak in __uncore_imc_init_box() perf/x86/intel/uncore: Fix reference count leak in snr_uncore_mmio_map() perf/x86/intel/uncore: Fix reference count leak in hswep_has_limit_sbox() perf/x86/intel/uncore: Fix reference count leak in sad_cfg_iio_topology() perf/x86/intel/uncore: Make set_mapping() procedure void perf/x86/intel/uncore: Update sysfs-devices-mapping file perf/x86/intel/uncore: Enable UPI topology discovery for Sapphire Rapids perf/x86/intel/uncore: Enable UPI topology discovery for Icelake Server perf/x86/intel/uncore: Get UPI NodeID and GroupID perf/x86/intel/uncore: Enable UPI topology discovery for Skylake Server perf/x86/intel/uncore: Generalize get_topology() for SKX PMUs perf/x86/intel/uncore: Disable I/O stacks to PMU mapping on ICX-D perf/x86/intel/uncore: Clear attr_update properly perf/x86/intel/uncore: Introduce UPI topology type perf/x86/intel/uncore: Generalize IIO topology support perf/core: Don't allow grouping events from different hw pmus perf/amd/ibs: Make IBS a core pmu perf: Fix function pointer case perf/x86/amd: Remove the repeated declaration perf: Fix possible memleak in pmu_dev_alloc() ...
2022-12-06Merge branch 'for-next/perf' into for-next/coreWill Deacon1-3/+0
* for-next/perf: (21 commits) arm_pmu: Drop redundant armpmu->map_event() in armpmu_event_init() drivers/perf: hisi: Add TLP filter support Documentation: perf: Indent filter options list of hisi-pcie-pmu docs: perf: Fix PMU instance name of hisi-pcie-pmu drivers/perf: hisi: Fix some event id for hisi-pcie-pmu arm64/perf: Replace PMU version number '0' with ID_AA64DFR0_EL1_PMUVer_NI perf/amlogic: Remove unused header inclusions of <linux/version.h> perf/amlogic: Fix build error for x86_64 allmodconfig dt-binding: perf: Add Amlogic DDR PMU docs/perf: Add documentation for the Amlogic G12 DDR PMU perf/amlogic: Add support for Amlogic meson G12 SoC DDR PMU driver MAINTAINERS: Update HiSilicon PMU maintainers perf: arm_cspmu: Fix module cyclic dependency perf: arm_cspmu: Fix build failure on x86_64 perf: arm_cspmu: Fix modular builds due to missing MODULE_LICENSE()s perf: arm_cspmu: Add support for NVIDIA SCF and MCF attribute perf: arm_cspmu: Add support for ARM CoreSight PMU driver perf/smmuv3: Fix hotplug callback leak in arm_smmu_pmu_init() perf/arm_dmc620: Fix hotplug callback leak in dmc620_pmu_init() drivers: perf: marvell_cn10k: Fix hotplug callback leak in tad_pmu_init() ...
2022-12-02arm_pmu: Drop redundant armpmu->map_event() in armpmu_event_init()Anshuman Khandual1-3/+0
__hw_perf_event_init() already calls armpmu->map_event() callback, and also returns its error code including -ENOENT, along with a debug callout. Hence an additional armpmu->map_event() check for -ENOENT is redundant. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20221202015611.338499-1-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-07arm_pmu: rework ACPI probingMark Rutland1-14/+3
The current ACPI PMU probing logic tries to associate PMUs with CPUs when the CPU is first brought online, in order to handle late hotplug, though PMUs are only registered during early boot, and so for late hotplugged CPUs this can only associate the CPU with an existing PMU. We tried to be clever and the have the arm_pmu_acpi_cpu_starting() callback allocate a struct arm_pmu when no matching instance is found, in order to avoid duplication of logic. However, as above this doesn't do anything useful for late hotplugged CPUs, and this requires us to allocate memory in an atomic context, which is especially problematic for PREEMPT_RT, as reported by Valentin and Pierre. This patch reworks the probing to detect PMUs for all online CPUs in the arm_pmu_acpi_probe() function, which is more aligned with how DT probing works. The arm_pmu_acpi_cpu_starting() callback only tries to associate CPUs with an existing arm_pmu instance, avoiding the problem of allocating in atomic context. Note that as we didn't previously register PMUs for late-hotplugged CPUs, this change doesn't result in a loss of existing functionality, though we will now warn when we cannot associate a CPU with a PMU. This change allows us to pull the hotplug callback registration into the arm_pmu_acpi_probe() function, as we no longer need the callbacks to be invoked shortly after probing the boot CPUs, and can register it without invoking the calls. For the moment the arm_pmu_acpi_init() initcall remains to register the SPE PMU, though in future this should probably be moved elsewhere (e.g. the arm64 ACPI init code), since this doesn't need to be tied to the regular CPU PMU code. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reported-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lore.kernel.org/r/20210810134127.1394269-2-valentin.schneider@arm.com/ Reported-by: Pierre Gondois <pierre.gondois@arm.com> Link: https://lore.kernel.org/linux-arm-kernel/20220912155105.1443303-1-pierre.gondois@arm.com/ Cc: Pierre Gondois <pierre.gondois@arm.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Will Deacon <will@kernel.org> Reviewed-and-tested-by: Pierre Gondois <pierre.gondois@arm.com> Link: https://lore.kernel.org/r/20220930111844.1522365-4-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-10-27perf: Rewrite core context handlingPeter Zijlstra1-9/+7
There have been various issues and limitations with the way perf uses (task) contexts to track events. Most notable is the single hardware PMU task context, which has resulted in a number of yucky things (both proposed and merged). Notably: - HW breakpoint PMU - ARM big.little PMU / Intel ADL PMU - Intel Branch Monitoring PMU - AMD IBS PMU - S390 cpum_cf PMU - PowerPC trace_imc PMU *Current design:* Currently we have a per task and per cpu perf_event_contexts: task_struct::perf_events_ctxp[] <-> perf_event_context <-> perf_cpu_context ^ | ^ | ^ `---------------------------------' | `--> pmu ---' v ^ perf_event ------' Each task has an array of pointers to a perf_event_context. Each perf_event_context has a direct relation to a PMU and a group of events for that PMU. The task related perf_event_context's have a pointer back to that task. Each PMU has a per-cpu pointer to a per-cpu perf_cpu_context, which includes a perf_event_context, which again has a direct relation to that PMU, and a group of events for that PMU. The perf_cpu_context also tracks which task context is currently associated with that CPU and includes a few other things like the hrtimer for rotation etc. Each perf_event is then associated with its PMU and one perf_event_context. *Proposed design:* New design proposed by this patch reduce to a single task context and a single CPU context but adds some intermediate data-structures: task_struct::perf_event_ctxp -> perf_event_context <- perf_cpu_context ^ | ^ ^ `---------------------------' | | | | perf_cpu_pmu_context <--. | `----. ^ | | | | | | v v | | ,--> perf_event_pmu_context | | | | | | | v v | perf_event ---> pmu ----------------' With the new design, perf_event_context will hold all events for all pmus in the (respective pinned/flexible) rbtrees. This can be achieved by adding pmu to rbtree key: {cpu, pmu, cgroup, group_index} Each perf_event_context carries a list of perf_event_pmu_context which is used to hold per-pmu-per-context state. For example, it keeps track of currently active events for that pmu, a pmu specific task_ctx_data, a flag to tell whether rotation is required or not etc. Additionally, perf_cpu_pmu_context is used to hold per-pmu-per-cpu state like hrtimer details to drive the event rotation, a pointer to perf_event_pmu_context of currently running task and some other ancillary information. Each perf_event is associated to it's pmu, perf_event_context and perf_event_pmu_context. Further optimizations to current implementation are possible. For example, ctx_resched() can be optimized to reschedule only single pmu events. Much thanks to Ravi for picking this up and pushing it towards completion. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Co-developed-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221008062424.313-1-ravi.bangoria@amd.com
2022-09-22perf: arm64: Add SVE vector granule register to user regsJames Clark1-1/+1
Dwarf based unwinding in a function that pushes SVE registers onto the stack requires the unwinder to know the length of the SVE register to calculate the stack offsets correctly. This was added to the Arm specific Dwarf spec as the VG pseudo register[1]. Add the vector length at position 46 if it's requested by userspace and SVE is supported. If it's not supported then fail to open the event. The vector length must be on each sample because it can be changed at runtime via a prctl or ptrace call. Also by adding it as a register rather than a separate attribute, minimal changes will be required in an unwinder that already indexes into the register list. [1]: https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: James Clark <james.clark@arm.com> Link: https://lore.kernel.org/r/20220901132658.1024635-2-james.clark@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-04-13arm_pmu: Validate single/group leader eventsRob Herring1-6/+4
In the case where there is only a cycle counter available (i.e. PMCR_EL0.N is 0) and an event other than CPU cycles is opened, the open should fail as the event can never possibly be scheduled. However, the event validation when an event is opened is skipped when the group leader is opened. Fix this by always validating the group leader events. Reported-by: Al Grant <al.grant@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20220408203330.4014015-1-robh@kernel.org Cc: <stable@vger.kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
2022-03-08Merge branch 'for-next/perf-m1' into for-next/perfWill Deacon1-0/+2
Support for the CPU PMUs on the Apple M1. * for-next/perf-m1: drivers/perf: Add Apple icestorm/firestorm CPU PMU driver drivers/perf: arm_pmu: Handle 47 bit counters irqchip/apple-aic: Move PMU-specific registers to their own include file arm64: dts: apple: Add t8303 PMU nodes arm64: dts: apple: Add t8103 PMU interrupt affinities irqchip/apple-aic: Wire PMU interrupts irqchip/apple-aic: Parse FIQ affinities from device-tree dt-bindings: apple,aic: Add affinity description for per-cpu pseudo-interrupts dt-bindings: apple,aic: Add CPU PMU per-cpu pseudo-interrupts dt-bindings: arm-pmu: Document Apple PMU compatible strings
2022-03-08drivers/perf: arm_pmu: Handle 47 bit countersMarc Zyngier1-0/+2
The current ARM PMU framework can only deal with 32 or 64bit counters. Teach it about a 47bit flavour. Yes, this is odd. Reviewed-by: Hector Martin <marcan@marcan.st> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
2022-02-15perf: replace bitmap_weight with bitmap_empty where appropriateYury Norov1-2/+2
In some places, drivers/perf code calls bitmap_weight() to check if any bit of a given bitmap is set. It's better to use bitmap_empty() in that case because bitmap_empty() stops traversing the bitmap as soon as it finds first set bit, while bitmap_weight() counts all bits unconditionally. Signed-off-by: Yury Norov <yury.norov@gmail.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20220210224933.379149-13-yury.norov@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
2021-09-20KVM: arm64: Fix PMU probe orderingMarc Zyngier1-0/+2
Russell reported that since 5.13, KVM's probing of the PMU has started to fail on his HW. As it turns out, there is an implicit ordering dependency between the architectural PMU probing code and and KVM's own probing. If, due to probe ordering reasons, KVM probes before the PMU driver, it will fail to detect the PMU and prevent it from being advertised to guests as well as the VMM. Obviously, this is one probing too many, and we should be able to deal with any ordering. Add a callback from the PMU code into KVM to advertise the registration of a host CPU PMU, allowing for any probing order. Fixes: 5421db1be3b1 ("KVM: arm64: Divorce the perf code from oprofile helpers") Reported-by: "Russell King (Oracle)" <linux@armlinux.org.uk> Tested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/YUYRKVflRtUytzy5@shell.armlinux.org.uk Cc: stable@vger.kernel.org
2021-06-02arm_pmu: move to use request_irq by IRQF_NO_AUTOEN flagTian Tao1-3/+1
request_irq() after setting IRQ_NOAUTOEN as below irq_set_status_flags(irq, IRQ_NOAUTOEN); request_irq(dev, irq...); can be replaced by request_irq() with IRQF_NO_AUTOEN flag. this patch is made base on "add IRQF_NO_AUTOEN for request_irq" which is being merged: https://lore.kernel.org/patchwork/patch/1388765/ Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/1622595642-61678-2-git-send-email-tiantao6@hisilicon.com Signed-off-by: Will Deacon <will@kernel.org>
2021-06-01perf: arm_pmu: use DEVICE_ATTR_RO macroYueHaibing1-3/+3
Use DEVICE_ATTR_RO helper instead of plain DEVICE_ATTR, which makes the code a bit shorter and easier to read. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20210528014130.7708-1-yuehaibing@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2021-05-25drivers/perf: arm_pmu: Fix some coding style issuesJunhao He1-4/+2
Fix some coding style issues reported by checkpatch.pl, including following types: ERROR: spaces required around that '=' (ctx:VxW) WARNING: Possible unnecessary 'out of memory' message Signed-off-by: Junhao He <hejunhao2@hisilicon.com> Signed-off-by: Jay Fang <f.fangjian@huawei.com> Link: https://lore.kernel.org/r/1620736054-58412-3-git-send-email-f.fangjian@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2021-04-22arm64: Get rid of oprofile leftoversMarc Zyngier1-30/+0
perf_pmu_name() and perf_num_counters() are now unused. Drop them. Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210414134409.1266357-3-maz@kernel.org
2021-01-20perf: Constify static struct attribute_groupRikard Falkeborn1-1/+1
The only usage is to put their addresses in an array of pointers to const struct attribute group. Make them const to allow the compiler to put them in read-only memory. Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com> Link: https://lore.kernel.org/r/20210117212847.21319-5-rikard.falkeborn@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
2021-01-13Revert "arm64: Enable perf events based hard lockup detector"Will Deacon1-5/+0
This reverts commit 367c820ef08082e68df8a3bc12e62393af21e4b5. lockup_detector_init() makes heavy use of per-cpu variables and must be called with preemption disabled. Usually, it's handled early during boot in kernel_init_freeable(), before SMP has been initialised. Since we do not know whether or not our PMU interrupt can be signalled as an NMI until considerably later in the boot process, the Arm PMU driver attempts to re-initialise the lockup detector off the back of a device_initcall(). Unfortunately, this is called from preemptible context and results in the following splat: | BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 | caller is debug_smp_processor_id+0x20/0x2c | CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.10.0+ #276 | Hardware name: linux,dummy-virt (DT) | Call trace: | dump_backtrace+0x0/0x3c0 | show_stack+0x20/0x6c | dump_stack+0x2f0/0x42c | check_preemption_disabled+0x1cc/0x1dc | debug_smp_processor_id+0x20/0x2c | hardlockup_detector_event_create+0x34/0x18c | hardlockup_detector_perf_init+0x2c/0x134 | watchdog_nmi_probe+0x18/0x24 | lockup_detector_init+0x44/0xa8 | armv8_pmu_driver_init+0x54/0x78 | do_one_initcall+0x184/0x43c | kernel_init_freeable+0x368/0x380 | kernel_init+0x1c/0x1cc | ret_from_fork+0x10/0x30 Rather than bodge this with raw_smp_processor_id() or randomly disabling preemption, simply revert the culprit for now until we figure out how to do this properly. Reported-by: Lecopzer Chen <lecopzer.chen@mediatek.com> Signed-off-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Cc: Sumit Garg <sumit.garg@linaro.org> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20201221162249.3119-1-lecopzer.chen@mediatek.com Link: https://lore.kernel.org/r/20210112221855.10666-1-will@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-11-25arm64: Enable perf events based hard lockup detectorSumit Garg1-0/+5
With the recent feature added to enable perf events to use pseudo NMIs as interrupts on platforms which support GICv3 or later, its now been possible to enable hard lockup detector (or NMI watchdog) on arm64 platforms. So enable corresponding support. One thing to note here is that normally lockup detector is initialized just after the early initcalls but PMU on arm64 comes up much later as device_initcall(). So we need to re-initialize lockup detection once PMU has been initialized. Signed-off-by: Sumit Garg <sumit.garg@linaro.org> Acked-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/1602060704-10921-1-git-send-email-sumit.garg@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2020-09-28arm_pmu: arm64: Use NMIs for PMUJulien Thierry1-8/+63
Add required PMU interrupt operations for NMIs. Request interrupt lines as NMIs when possible, otherwise fall back to normal interrupts. NMIs are only supported on the arm64 architecture with a GICv3 irqchip. [Alexandru E.: Added that NMIs only work on arm64 + GICv3, print message when PMU is using NMIs] Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox) Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20200924110706.254996-8-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-28arm_pmu: Introduce pmu_irq_opsJulien Thierry1-16/+74
Currently the PMU interrupt can either be a normal irq or a percpu irq. Supporting NMI will introduce two cases for each existing one. It becomes a mess of 'if's when managing the interrupt. Define sets of callbacks for operations commonly done on the interrupt. The appropriate set of callbacks is selected at interrupt request time and simplifies interrupt enabling/disabling and freeing. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox) Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20200924110706.254996-7-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2019-07-29drivers/perf: arm_pmu: Fix failure path in PM notifierWill Deacon1-1/+1
Handling of the CPU_PM_ENTER_FAILED transition in the Arm PMU PM notifier code incorrectly skips restoration of the counters. Fix the logic so that CPU_PM_ENTER_FAILED follows the same path as CPU_PM_EXIT. Cc: <stable@vger.kernel.org> Fixes: da4e4f18afe0f372 ("drivers/perf: arm_pmu: implement CPU_PM notifier") Reported-by: Anders Roxell <anders.roxell@linaro.org> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2019-05-21treewide: Add SPDX license identifier for missed filesThomas Gleixner1-0/+1
Add SPDX license identifiers to all files which: - Have no license information of any form - Have EXPORT_.*_SYMBOL_GPL inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-21perf/core, arch/arm: Use PERF_PMU_CAP_NO_EXCLUDE conditionallyAndrew Murray1-10/+5
The ARM PMU driver can be used to represent a variety of ARM based PMUs. Some of these PMUs do not provide support for context exclusion, where this is the case we advertise the PERF_PMU_CAP_NO_EXCLUDE capability to ensure that perf prevents us from handling events where any exclusion flags are set. Where an ARM PMU driver has the set_event_filter function implemented, we rely on it to perform exclusion checks. At present some of these functions do not test for all of the available exclude flags. Signed-off-by: Andrew Murray <andrew.murray@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Russell King <linux@armlinux.org.uk> Cc: Sascha Hauer <s.hauer@pengutronix.de> Cc: Shawn Guo <shawnguo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arm-kernel@lists.infradead.org Cc: linuxppc-dev@lists.ozlabs.org Cc: robin.murphy@arm.com Cc: suzuki.poulose@arm.com Link: https://lkml.kernel.org/r/1547128414-50693-6-git-send-email-andrew.murray@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-12arm64: perf: Reject stand-alone CHAIN events for PMUv3Will Deacon1-1/+7
It doesn't make sense for a perf event to be configured as a CHAIN event in isolation, so extend the arm_pmu structure with a ->filter_match() function to allow the backend PMU implementation to reject CHAIN events early. Cc: <stable@vger.kernel.org> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-10arm64: perf: Add support for chaining event countersSuzuki K Poulose1-7/+2
Add support for 64bit event by using chained event counters and 64bit cycle counters. PMUv3 allows chaining a pair of adjacent 32-bit counters, effectively forming a 64-bit counter. The low/even counter is programmed to count the event of interest, and the high/odd counter is programmed to count the CHAIN event, taken when the low/even counter overflows. For CPU cycles, when 64bit mode is requested, the cycle counter is used in 64bit mode. If the cycle counter is not available, falls back to chaining. Cc: Will Deacon <will.deacon@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-10arm_pmu: Tidy up clear_event_idx call backsSuzuki K Poulose1-4/+3
The armpmu uses get_event_idx callback to allocate an event counter for a given event, which marks the selected counter as "used". Now, when we delete the counter, the arm_pmu goes ahead and clears the "used" bit and then invokes the "clear_event_idx" call back, which kind of splits the job between the core code and the backend. To keep things tidy, mandate the implementation of clear_event_idx() and add it for exisiting backends. This will be useful for adding the chained event support, where we leave the event idx maintenance to the backend. Also, when an event is removed from the PMU, reset the hw.idx to indicate that a counter is not allocated for this event, to help the backends do better checks. This will be also used for the chain counter support. Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-10arm_pmu: Add support for 64bit event countersSuzuki K Poulose1-6/+10
Each PMU has a set of 32bit event counters. But in some special cases, the events could be counted using counters which are effectively 64bit wide. e.g, Arm V8 PMUv3 has a 64 bit cycle counter which can count only the CPU cycles. Also, the PMU can chain the event counters to effectively count as a 64bit counter. Add support for tracking the events that uses 64bit counters. This only affects the periods set for each counter in the core driver. Cc: Will Deacon <will.deacon@arm.com> Reviewed-by: Julien Thierry <julien.thierry@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-10arm_pmu: Clean up maximum period handlingSuzuki K Poulose1-4/+12
Each PMU defines their max_period of the counter as the maximum value that can be counted. Since all the PMU backends support 32bit counters by default, let us remove the redundant field. No functional changes. Cc: Will Deacon <will.deacon@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-05-21arm_pmu: simplify arm_pmu::handle_irqMark Rutland1-1/+1
The arm_pmu::handle_irq() callback has the same prototype as a generic IRQ handler, taking the IRQ number and a void pointer argument which it must convert to an arm_pmu pointer. This means that all arm_pmu::handle_irq() take an IRQ number they never use, and all must explicitly cast the void pointer to an arm_pmu pointer. Instead, let's change arm_pmu::handle_irq to take an arm_pmu pointer, allowing these casts to be removed. The redundant IRQ number parameter is also removed. Suggested-by: Hoeun Ryu <hoeun.ryu@lge.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-19Merge tag 'v4.16-rc6' into perf/core, to pick up fixesIngo Molnar1-1/+1
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-16perf: Fix sibling iterationPeter Zijlstra1-1/+1
Mark noticed that the change to sibling_list changed some iteration semantics; because previously we used group_list as list entry, sibling events would always have an empty sibling_list. But because we now use sibling_list for both list head and list entry, siblings will report as having siblings. Fix this with a custom for_each_sibling_event() iterator. Fixes: 8343aae66167 ("perf/core: Remove perf_event::group_entry") Reported-by: Mark Rutland <mark.rutland@arm.com> Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: vincent.weaver@maine.edu Cc: alexander.shishkin@linux.intel.com Cc: torvalds@linux-foundation.org Cc: alexey.budankov@linux.intel.com Cc: valery.cherepennikov@intel.com Cc: eranian@google.com Cc: acme@redhat.com Cc: linux-tip-commits@vger.kernel.org Cc: davidcc@google.com Cc: kan.liang@intel.com Cc: Dmitry.Prohorov@intel.com Cc: jolsa@redhat.com Link: https://lkml.kernel.org/r/20180315170129.GX4043@hirez.programming.kicks-ass.net
2018-03-12perf/core: Remove perf_event::group_entryPeter Zijlstra1-1/+1
Now that all the grouping is done with RB trees, we no longer need group_entry and can replace the whole thing with sibling_list. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Carrillo-Cisneros <davidcc@google.com> Cc: Dmitri Prokhorov <Dmitry.Prohorov@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Valery Cherepennikov <valery.cherepennikov@intel.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-28arm_pmu: Use disable_irq_nosync when disabling SPI in CPU teardown hookWill Deacon1-1/+1
Commit 6de3f79112cc ("arm_pmu: explicitly enable/disable SPIs at hotplug") moved all of the arm_pmu IRQ enable/disable calls to the CPU hotplug hooks, regardless of whether they are implemented as PPIs or SPIs. This can lead to us sleeping from atomic context due to disable_irq blocking: | BUG: sleeping function called from invalid context at kernel/irq/manage.c:112 | in_atomic(): 1, irqs_disabled(): 128, pid: 15, name: migration/1 | no locks held by migration/1/15. | irq event stamp: 192 | hardirqs last enabled at (191): [<00000000803c2507>] | _raw_spin_unlock_irq+0x2c/0x4c | hardirqs last disabled at (192): [<000000007f57ad28>] multi_cpu_stop+0x9c/0x140 | softirqs last enabled at (0): [<0000000004ee1b58>] | copy_process.isra.77.part.78+0x43c/0x1504 | softirqs last disabled at (0): [< (null)>] (null) | CPU: 1 PID: 15 Comm: migration/1 Not tainted 4.16.0-rc3-salvator-x #1651 | Hardware name: Renesas Salvator-X board based on r8a7796 (DT) | Call trace: | dump_backtrace+0x0/0x140 | show_stack+0x14/0x1c | dump_stack+0xb4/0xf0 | ___might_sleep+0x1fc/0x218 | __might_sleep+0x70/0x80 | synchronize_irq+0x40/0xa8 | disable_irq+0x20/0x2c | arm_perf_teardown_cpu+0x80/0xac Since the interrupt is always CPU-affine and this code is running with interrupts disabled, we can just use disable_irq_nosync as we know there isn't a concurrent invocation of the handler to worry about. Fixes: 6de3f79112cc ("arm_pmu: explicitly enable/disable SPIs at hotplug") Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-20arm_pmu: acpi: request IRQs up-frontMark Rutland1-20/+2
We can't request IRQs in atomic context, so for ACPI systems we'll have to request them up-front, and later associate them with CPUs. This patch reorganises the arm_pmu code to do so. As we no longer have the arm_pmu structure at probe time, a number of prototypes need to be adjusted, requiring changes to the common arm_pmu code and arm_pmu platform code. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-02-20arm_pmu: note IRQs and PMUs per-cpuMark Rutland1-17/+52
To support ACPI systems, we need to request IRQs before we know the associated PMU, and thus we need some percpu variable that the IRQ handler can find the PMU from. As we're going to request IRQs without the PMU, we can't rely on the arm_pmu::active_irqs mask, and similarly need to track requested IRQs with a percpu variable. Signed-off-by: Mark Rutland <mark.rutland@arm.com> [will: made armpmu_count_irq_users static] Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-02-20arm_pmu: explicitly enable/disable SPIs at hotplugMark Rutland1-5/+10
To support ACPI systems, we need to request IRQs before CPUs are hotplugged, and thus we need to request IRQs before we know their associated PMU. This is problematic if a PMU IRQ is pending out of reset, as it may be taken before we know the PMU, and thus the IRQ handler won't be able to handle it, leaving it screaming. To avoid such problems, lets request all IRQs in a disabled state, and explicitly enable/disable them at hotplug time, when we're sure the PMU has been probed. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-02-20arm_pmu: acpi: check for mismatched PPIsMark Rutland1-13/+4
The arm_pmu platform code explicitly checks for mismatched PPIs at probe time, while the ACPI code leaves this to the core code. Future refactoring will make this difficult for the core code to check, so let's have the ACPI code check this explicitly. As before, upon a failure we'll continue on without an interrupt. Ho hum. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-02-20arm_pmu: add armpmu_alloc_atomic()Mark Rutland1-3/+14
In ACPI systems, we don't know the makeup of CPUs until we hotplug them on, and thus have to allocate the PMU datastructures at hotplug time. Thus, we must use GFP_ATOMIC allocations. Let's add an armpmu_alloc_atomic() that we can use in this case. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-02-20arm_pmu: fold platform helpers into platform codeMark Rutland1-21/+0
The armpmu_{request,free}_irqs() helpers are only used by arm_pmu_platform.c, so let's fold them in and make them static. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-02-20arm_pmu: kill arm_pmu_platdataMark Rutland1-23/+4
Now that we have no platforms passing platform data to the arm_pmu code, we can get rid of the platdata and associated hooks, paving the way for rework of our IRQ handling. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-10-24arm/arm64: pmu: Distinguish percpu irq and percpu_devid irqJulien Thierry1-5/+5
arm_pmu interrupts are maked as PERCPU even when these are not local physical interrupts to a single CPU. When using non-local interrupts, interrupts marked as PERCPU will not get freed not disabled properly by the PMU driver. Check if interrupts are local to a single CPU with PERCPU_DEVID since this is what the PMU driver really needs to know. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-08-08arm64: perf: Allow standard PMUv3 events to be extended by the CPU typeWill Deacon1-0/+6
Rather than continue adding CPU-specific event maps, instead look up by default in the PMUv3 event map and only fallback to the CPU-specific maps if either the event isn't described by PMUv3, or it is described but the PMCEID registers say that it is unsupported by the current CPU. Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-07-27drivers/perf: arm_pmu: Request PMU SPIs with IRQF_PER_CPUWill Deacon1-14/+27
Since the PMU register interface is banked per CPU, CPU PMU interrrupts cannot be handled by a CPU other than the one with the PMU asserting the interrupt. This means that migrating PMU SPIs, as we do during a CPU hotplug operation doesn't make any sense and can lead to the IRQ being disabled entirely if we route a spurious IRQ to the new affinity target. This has been observed in practice on AMD Seattle, where CPUs on the non-boot cluster appear to take a spurious PMU IRQ when coming online, which is routed to CPU0 where it cannot be handled. This patch passes IRQF_PERCPU for PMU SPIs and forcefully sets their affinity prior to requesting them, ensuring that they cannot be migrated during hotplug events. This interacts badly with the DB8500 erratum workaround that ping-pongs the interrupt affinity from the handler, so we avoid passing IRQF_PERCPU in that case by allowing the IRQ flags to be overridden in the platdata. Fixes: 3cf7ee98b848 ("drivers/perf: arm_pmu: move irq request/free into probe") Cc: Mark Rutland <mark.rutland@arm.com> Cc: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-04-11drivers/perf: arm_pmu: add ACPI frameworkMark Rutland1-2/+2
This patch adds framework code to handle parsing PMU data out of the MADT, sanity checking this, and managing the association of CPUs (and their interrupts) with appropriate logical PMUs. For the time being, we expect that only one PMU driver (PMUv3) will make use of this, and we simply pass in a single probe function. This is based on an earlier patch from Jeremy Linton. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Jeremy Linton <jeremy.linton@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-04-11drivers/perf: arm_pmu: split out platform device probe logicMark Rutland1-222/+4
Now that we've split the pdev and DT probing logic from the runtime management, let's move the former into its own file. We gain a few lines due to the copyright header and includes, but this should keep the logic clearly separated, and paves the way for adding ACPI support in a similar fashion. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Jeremy Linton <jeremy.linton@arm.com> [will: rename nr_irqs to avoid conflict with global variable] Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-04-11drivers/perf: arm_pmu: move irq request/free into probeMark Rutland1-5/+6
Currently we request (and potentially free) all IRQs for a given PMU in cpu_pmu_init(). This works for platform/DT probing today, but it doesn't fit ACPI well as we don't have all our affinity data up-front. In preparation for ACPI support, fold the IRQ request/free into arm_pmu_device_probe(), which will remain specific to platform/DT probing. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Jeremy Linton <jeremy.linton@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-04-11drivers/perf: arm_pmu: split cpu-local irq request/freeMark Rutland1-36/+52
Currently we have functions to request/free all IRQs for a given PMU. While this works today, this won't work for ACPI, where we don't know the full set of IRQs up front, and need to request them separately. To enable supporting ACPI, this patch splits out the cpu-local request/free into new functions, allowing us to request/free individual IRQs. As this makes it possible/necessary to request a PPI once per cpu, an additional check is added to detect mismatched PPIs. This shouldn't matter for the DT / platform case, as we check this when parsing. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Jeremy Linton <jeremy.linton@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-04-11drivers/perf: arm_pmu: rename irq request/free functionsMark Rutland1-10/+10
For historical reasons, portions of the arm_pmu code use a cpu_pmu_ prefix rather than an armpmu_ prefix. While a minor annoyance, this hasn't been a problem thusfar. However, to enable ACPI support, we'll need to expose a few things in header files, and we should aim to keep those consistently namespaced. In preparation for exporting our IRQ request/free functions, rename these to have an armpmu_ prefix. For consistency, the 'cpu_pmu' parameter is also renamed to 'armpmu'. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Jeremy Linton <jeremy.linton@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-04-11drivers/perf: arm_pmu: handle no platform_deviceMark Rutland1-3/+9
In armpmu_dispatch_irq() we look at arm_pmu::plat_device to acquire platdata, so that we can defer to platform-specific IRQ handling, required on some 32-bit parts. With the advent of ACPI we won't always have a platform_device, and so we must avoid trying to dereference fields from it. This patch fixes up armpmu_dispatch_irq() to avoid doing so, introducing a new armpmu_get_platdata() helper. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Jeremy Linton <jeremy.linton@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-04-11drivers/perf: arm_pmu: simplify cpu_pmu_request_irqs()Mark Rutland1-2/+3
The ARM PMU framework code always uses armpmu_dispatch_irq as its common IRQ handler. Passing this down from cpu_pmu_init() is somewhat pointless, and gets in the way of refactoring. This patch makes cpu_pmu_request_irqs() always use armpmu_dispatch_irq as the handler when requesting IRQs, and removes the handler parameter from its prototype. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Jeremy Linton <jeremy.linton@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>