summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2011-12-06jump_label: Provide jump_label_key initializersPeter Zijlstra2-4/+11
Provide two initializers for jump_label_key that initialize it enabled or disabled. Also modify all jump_label code to allow for jump_labels to be initialized enabled. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jason Baron <jbaron@redhat.com> Link: http://lkml.kernel.org/n/tip-p40e3yj21b68y03z1yv825e7@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06jump_label, x86: Fix section mismatchPeter Zijlstra2-2/+2
WARNING: arch/x86/kernel/built-in.o(.text+0x4c71): Section mismatch in reference from the function arch_jump_label_transform_static() to the function .init.text:text_poke_early() The function arch_jump_label_transform_static() references the function __init text_poke_early(). This is often because arch_jump_label_transform_static lacks a __init annotation or the annotation of text_poke_early is wrong. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jason Baron <jbaron@redhat.com> Link: http://lkml.kernel.org/n/tip-9lefe89mrvurrwpqw5h8xm8z@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06Merge branch 'tip/perf/core' of ↵Ingo Molnar3-28/+79
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core
2011-12-06perf, core: Rate limit perf_sched_events jump_label patchingGleb Natapov4-10/+68
jump_lable patching is very expensive operation that involves pausing all cpus. The patching of perf_sched_events jump_label is easily controllable from userspace by unprivileged user. When te user runs a loop like this: "while true; do perf stat -e cycles true; done" ... the performance of my test application that just increments a counter for one second drops by 4%. This is on a 16 cpu box with my test application using only one of them. An impact on a real server doing real work will be worse. Performance of KVM PMU drops nearly 50% due to jump_lable for "perf record" since KVM PMU implementation creates and destroys perf event frequently. This patch introduces a way to rate limit jump_label patching and uses it to fix the above problem. I believe that as jump_label use will spread the problem will become more common and thus solving it in a generic code is appropriate. Also fixing it in the perf code would result in moving jump_label accounting logic to perf code with all the ifdefs in case of JUMP_LABEL=n kernel. With this patch all details are nicely hidden inside jump_label code. Signed-off-by: Gleb Natapov <gleb@redhat.com> Acked-by: Jason Baron <jbaron@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20111127155909.GO2557@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06perf: Fix enable_on_exec for sibling eventsPeter Zijlstra1-7/+1
Deng-Cheng Zhu reported that sibling events that were created disabled with enable_on_exec would never get enabled. Iterate all events instead of the group lists. Reported-by: Deng-Cheng Zhu <dczhu@mips.com> Tested-by: Deng-Cheng Zhu <dczhu@mips.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1322048382.14799.41.camel@twins Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06perf: Remove superfluous argumentsPeter Zijlstra1-5/+4
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-yv4o74vh90suyghccgykbnry@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06perf, x86: Prefer fixed-purpose counters when schedulingPeter Zijlstra1-5/+14
This avoids a scheduling failure for cases like: cycles, cycles, instructions, instructions (on Core2) Which would end up being programmed like: PMC0, PMC1, FP-instructions, fail Because all events will have the same weight. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-8tnwb92asqj7xajqqoty4gel@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06perf, x86: Fix event scheduler for constraints with overlapping countersRobert Richter3-5/+72
The current x86 event scheduler fails to resolve scheduling problems of certain combinations of events and constraints. This happens if the counter mask of such an event is not a subset of any other counter mask of a constraint with an equal or higher weight, e.g. constraints of the AMD family 15h pmu: counter mask weight amd_f15_PMC30 0x09 2 <--- overlapping counters amd_f15_PMC20 0x07 3 amd_f15_PMC53 0x38 3 The scheduler does not find then an existing solution. Here is an example: event code counter failure possible solution 0x02E PMC[3,0] 0 3 0x043 PMC[2:0] 1 0 0x045 PMC[2:0] 2 1 0x046 PMC[2:0] FAIL 2 The event scheduler may not select the correct counter in the first cycle because it needs to know which subsequent events will be scheduled. It may fail to schedule the events then. To solve this, we now save the scheduler state of events with overlapping counter counstraints. If we fail to schedule the events we rollback to those states and try to use another free counter. Constraints with overlapping counters are marked with a new introduced overlap flag. We set the overlap flag for such constraints to give the scheduler a hint which events to select for counter rescheduling. The EVENT_CONSTRAINT_OVERLAP() macro can be used for this. Care must be taken as the rescheduling algorithm is O(n!) which will increase scheduling cycles for an over-commited system dramatically. The number of such EVENT_CONSTRAINT_OVERLAP() macros and its counter masks must be kept at a minimum. Thus, the current stack is limited to 2 states to limit the number of loops the algorithm takes in the worst case. On systems with no overlapping-counter constraints, this implementation does not increase the loop count compared to the previous algorithm. V2: * Renamed redo -> overlap. * Reimplementation using perf scheduling helper functions. V3: * Added WARN_ON_ONCE() if out of save states. * Changed function interface of perf_sched_restore_state() to use bool as return value. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1321616122-1533-3-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06perf, x86: Implement event scheduler helper functionsRobert Richter2-55/+140
This patch introduces x86 perf scheduler code helper functions. We need this to later add more complex functionality to support overlapping counter constraints (next patch). The algorithm is modified so that the range of weight values is now generated from the constraints. There shouldn't be other functional changes. With the helper functions the scheduler is controlled. There are functions to initialize, traverse the event list, find unused counters etc. The scheduler keeps its own state. V3: * Added macro for_each_set_bit_cont(). * Changed functions interfaces of perf_sched_find_counter() and perf_sched_next_event() to use bool as return value. * Added some comments to make code better understandable. V4: * Fix broken event assignment if weight of the first event is not wmin (perf_sched_init()). Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1321616122-1533-2-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06perf: Avoid a useless pmu_disable() in the perf-tickPeter Zijlstra2-16/+33
Gleb writes: > Currently pmu is disabled and re-enabled on each timer interrupt even > when no rotation or frequency adjustment is needed. On Intel CPU this > results in two writes into PERF_GLOBAL_CTRL MSR per tick. On bare metal > it does not cause significant slowdown, but when running perf in a virtual > machine it leads to 20% slowdown on my machine. Cure this by keeping a perf_event_context::nr_freq counter that counts the number of active events that require frequency adjustments and use this in a similar fashion to the already existing nr_events != nr_active test in perf_rotate_context(). By being able to exclude both rotation and frequency adjustments a-priory for the common case we can avoid the otherwise superfluous PMU disable. Suggested-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-515yhoatehd3gza7we9fapaa@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06Merge branch 'perf/urgent' into perf/coreIngo Molnar245-2463/+2668
Merge reason: Add these cherry-picked commits so that future changes on perf/core don't conflict. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05ftrace: Fix hash record accounting bugSteven Rostedt1-1/+3
If the set_ftrace_filter is cleared by writing just whitespace to it, then the filter hash refcounts will be decremented but not updated. This causes two bugs: 1) No functions will be enabled for tracing when they all should be 2) If the users clears the set_ftrace_filter twice, it will crash ftrace: ------------[ cut here ]------------ WARNING: at /home/rostedt/work/git/linux-trace.git/kernel/trace/ftrace.c:1384 __ftrace_hash_rec_update.part.27+0x157/0x1a7() Modules linked in: Pid: 2330, comm: bash Not tainted 3.1.0-test+ #32 Call Trace: [<ffffffff81051828>] warn_slowpath_common+0x83/0x9b [<ffffffff8105185a>] warn_slowpath_null+0x1a/0x1c [<ffffffff810ba362>] __ftrace_hash_rec_update.part.27+0x157/0x1a7 [<ffffffff810ba6e8>] ? ftrace_regex_release+0xa7/0x10f [<ffffffff8111bdfe>] ? kfree+0xe5/0x115 [<ffffffff810ba51e>] ftrace_hash_move+0x2e/0x151 [<ffffffff810ba6fb>] ftrace_regex_release+0xba/0x10f [<ffffffff8112e49a>] fput+0xfd/0x1c2 [<ffffffff8112b54c>] filp_close+0x6d/0x78 [<ffffffff8113a92d>] sys_dup3+0x197/0x1c1 [<ffffffff8113a9a6>] sys_dup2+0x4f/0x54 [<ffffffff8150cac2>] system_call_fastpath+0x16/0x1b ---[ end trace 77a3a7ee73794a02 ]--- Link: http://lkml.kernel.org/r/20111101141420.GA4918@debian Reported-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-12-05perf: Fix parsing of __print_flags() in TP_printk()Steven Rostedt1-0/+2
A update is made to the sched:sched_switch event that adds some logic to the first parameter of the __print_flags() that shows the state of tasks. This change cause perf to fail parsing the flags. A simple fix is needed to have the parser be able to process ops within the argument. Cc: stable@vger.kernel.org Reported-by: Andrew Vagin <avagin@openvz.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-12-05jump_label: jump_label_inc may return before the code is patchedGleb Natapov1-1/+2
If cpu A calls jump_label_inc() just after atomic_add_return() is called by cpu B, atomic_inc_not_zero() will return value greater then zero and jump_label_inc() will return to a caller before jump_label_update() finishes its job on cpu B. Link: http://lkml.kernel.org/r/20111018175551.GH17571@redhat.com Cc: stable@vger.kernel.org Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Jason Baron <jbaron@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-12-05ftrace: Remove force undef config value left for testingSteven Rostedt1-1/+0
A forced undef of a config value was used for testing and was accidently left in during the final commit. This causes x86 to run slower than needed while running function tracing as well as causes the function graph selftest to fail when DYNMAIC_FTRACE is not set. This is because the code in MCOUNT expects the ftrace code to be processed with the config value set that happened to be forced not set. The forced config option was left in by: commit 6331c28c962561aee59e5a493b7556a4bb585957 ftrace: Fix dynamic selftest failure on some archs Link: http://lkml.kernel.org/r/20111102150255.GA6973@debian Cc: stable@vger.kernel.org Reported-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-12-05tracing: Restore system filter behaviorLi Zefan2-1/+8
Though not all events have field 'prev_pid', it was allowed to do this: # echo 'prev_pid == 100' > events/sched/filter but commit 75b8e98263fdb0bfbdeba60d4db463259f1fe8a2 (tracing/filter: Swap entire filter of events) broke it without any reason. Link: http://lkml.kernel.org/r/4EAF46CF.8040408@cn.fujitsu.com Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-12-05tracing: fix event_subsystem ref countingIlya Dryomov1-1/+0
Fix a bug introduced by e9dbfae5, which prevents event_subsystem from ever being released. Ref_count was added to keep track of subsystem users, not for counting events. Subsystem is created with ref_count = 1, so there is no need to increment it for every event, we have nr_events for that. Fix this by touching ref_count only when we actually have a new user - subsystem_open(). Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Link: http://lkml.kernel.org/r/1320052062-7846-1-git-send-email-idryomov@gmail.com Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-12-05x86/tools: Add decoded instruction dump modeMasami Hiramatsu1-3/+4
Add instruction dump mode to insn_sanity tool for checking decoder really decoded instructions. This mode is enabled when passing double -v (-vv) to insn_sanity. It is useful for who wants to check whether the decoder can decode some instructions correctly. e.g. $ echo 0f 73 10 11 | ./insn_sanity -y -vv -i - Instruction = { .prefixes = { .value = 0, bytes[] = {0, 0, 0, 0}, .got = 1, .nbytes = 0}, .rex_prefix = { .value = 0, bytes[] = {0, 0, 0, 0}, .got = 1, .nbytes = 0}, .vex_prefix = { .value = 0, bytes[] = {0, 0, 0, 0}, .got = 1, .nbytes = 0}, .opcode = { .value = 29455, bytes[] = {f, 73, 0, 0}, .got = 1, .nbytes = 2}, .modrm = { .value = 16, bytes[] = {10, 0, 0, 0}, .got = 1, .nbytes = 1}, .sib = { .value = 0, bytes[] = {0, 0, 0, 0}, .got = 1, .nbytes = 0}, .displacement = { .value = 0, bytes[] = {0, 0, 0, 0}, .got = 1, .nbytes = 0}, .immediate1 = { .value = 17, bytes[] = {11, 0, 0, 0}, .got = 1, .nbytes = 1}, .immediate2 = { .value = 0, bytes[] = {0, 0, 0, 0}, .got = 0, .nbytes = 0}, .attr = 44800, .opnd_bytes = 4, .addr_bytes = 8, .length = 4, .x86_64 = 1, .kaddr = 0x7fff0f7d9430} Success: decoded and checked 1 given instructions with 0 errors (seed:0x0) Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: yrl.pp-manager.tt@hitachi.com Link: http://lkml.kernel.org/r/20111205120603.15475.91192.stgit@cloud Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86: Update instruction decoder to support new AVX formatsMasami Hiramatsu2-282/+345
Since new Intel software developers manual introduces new format for AVX instruction set (including AVX2), it is important to update x86-opcode-map.txt to fit those changes. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: yrl.pp-manager.tt@hitachi.com Link: http://lkml.kernel.org/r/20111205120557.15475.13236.stgit@cloud Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86/tools: Fix insn_sanity message outputsMasami Hiramatsu1-2/+2
Fix x86 instruction decoder test to dump all error messages to stderr and others to stdout. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: yrl.pp-manager.tt@hitachi.com Link: http://lkml.kernel.org/r/20111205120550.15475.70149.stgit@cloud Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86/tools: Fix instruction decoder message outputMasami Hiramatsu1-4/+3
Fix instruction decoder test (insn_sanity), so that it doesn't show both info and error messages twice on same instruction. (In that case, show only error message) Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: yrl.pp-manager.tt@hitachi.com Link: http://lkml.kernel.org/r/20111205120545.15475.7928.stgit@cloud Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86: Fix instruction decoder to handle grouped AVX instructionsMasami Hiramatsu2-2/+11
For reducing memory usage of attribute table, x86 instruction decoder puts "Group" attribute only on "no-last-prefix" attribute table (same as vex_p == 0 case). Thus, the decoder should look no-last-prefix table first, and then only if it is not a group, move on to "with-last-prefix" table (vex_p != 0). However, current implementation, inat_get_avx_attribute() looks with-last-prefix directly. So, when decoding a grouped AVX instruction, the decoder fails to find correct group because there is no "Group" attribute on the table. This ends up with the mis-decoding of instructions, as Ingo reported in http://thread.gmane.org/gmane.linux.kernel/1214103 This patch fixes it to check no-last-prefix table first even if that is an AVX instruction, and get an attribute from "with last-prefix" table only if that is not a group. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: yrl.pp-manager.tt@hitachi.com Link: http://lkml.kernel.org/r/20111205120539.15475.91428.stgit@cloud Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86/tools: Fix Makefile to build all test toolsMasami Hiramatsu1-2/+1
Fix arch/x86/tools/Makefile to compile both test tools correctly. This bug leads build error. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: yrl.pp-manager.tt@hitachi.com Link: http://lkml.kernel.org/r/20111205120533.15475.62047.stgit@cloud Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05Merge branch 'tip/perf/urgent' of ↵Ingo Molnar1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/urgent
2011-12-05Merge branch 'perf/urgent' of git://github.com/acmel/linux into perf/urgentIngo Molnar4-12/+14
2011-12-05perf: Fix loss of notification with multi-eventPeter Zijlstra4-2/+91
When you do: $ perf record -e cycles,cycles,cycles noploop 10 You expect about 10,000 samples for each event, i.e., 10s at 1000samples/sec. However, this is not what's happening. You get much fewer samples, maybe 3700 samples/event: $ perf report -D | tail -15 Aggregated stats: TOTAL events: 10998 MMAP events: 66 COMM events: 2 SAMPLE events: 10930 cycles stats: TOTAL events: 3644 SAMPLE events: 3644 cycles stats: TOTAL events: 3642 SAMPLE events: 3642 cycles stats: TOTAL events: 3644 SAMPLE events: 3644 On a Intel Nehalem or even AMD64, there are 4 counters capable of measuring cycles, so there is plenty of space to measure those events without multiplexing (even with the NMI watchdog active). And even with multiplexing, we'd expect roughly the same number of samples per event. The root of the problem was that when the event that caused the buffer to become full was not the first event passed on the cmdline, the user notification would get lost. The notification was sent to the file descriptor of the overflowed event but the perf tool was not polling on it. The perf tool aggregates all samples into a single buffer, i.e., the buffer of the first event. Consequently, it assumes notifications for any event will come via that descriptor. The seemingly straight forward solution of moving the waitq into the ringbuffer object doesn't work because of life-time issues. One could perf_event_set_output() on a fd that you're also blocking on and cause the old rb object to be freed while its waitq would still be referenced by the blocked thread -> FAIL. Therefore link all events to the ringbuffer and broadcast the wakeup from the ringbuffer object to all possible events that could be waited upon. This is rather ugly, and we're open to better solutions but it works for now. Reported-by: Stephane Eranian <eranian@google.com> Finished-by: Stephane Eranian <eranian@google.com> Reviewed-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20111126014731.GA7030@quad Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05perf, x86: Force IBS LVT offset assignment for family 10hRobert Richter1-11/+18
On AMD family 10h we see firmware bug messages like the following: [Firmware Bug]: cpu 6, try to use APIC500 (LVT offset 0) for vector 0x10400, but the register is already in use for vector 0xf9 on another cpu [Firmware Bug]: cpu 6, IBS interrupt offset 0 not available (MSRC001103A=0x0000000000000100) [Firmware Bug]: using offset 1 for IBS interrupts [Firmware Bug]: workaround enabled for IBS LVT offset perf: AMD IBS detected (0x00000007) We always see this, since the offsets are not assigned by the BIOS for this family. Force LVT offset assignment in this case. If the OS assignment fails, fallback to BIOS settings and try to setup this. The fallback to BIOS settings weakens the family check since force_ibs_eilvt_setup() may fail e.g. in case of virtual machines. But setup may still succeed if BIOS offsets are correct. Other families don't have a workaround implemented that assigns LVT offsets. It's ok, to drop calling force_ibs_eilvt_setup() for that families. With the patch the [Firmware Bug] messages vanish. We see now: IBS: LVT offset 1 assigned perf: AMD IBS detected (0x00000007) Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20111109162225.GO12451@erda.amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05perf, x86: Disable PEBS on SandyBridge chipsPeter Zijlstra1-0/+8
Cc: Stephane Eranian <eranian@google.com> Cc: stable@kernel.org Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-02perf test: Soft errors shouldn't stop the "Validate PERF_RECORD_" testArnaldo Carvalho de Melo1-24/+21
For errors that don't preclude checking for further errors, aka "soft" errors, just continue testing for other errors. Better coverage in verbose mode. Suggested-by: David Ahern <dsahern@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-jafcokbj26m845dsgm2hx6az@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-12-02perf test: Validate PERF_RECORD_ events and perf_sample fieldsArnaldo Carvalho de Melo1-0/+337
This new test will validate these new routines extracted from 'perf record': - perf_evlist__config_attrs - perf_evlist__prepare_workload - perf_evlist__start_workload In addition to several other perf_evlist methods. It consists of starting a simple workload, setting up just one event to monitor ("cycles") requesting that several PERF_SAMPLE_ fields be present in all events. It then will check that the expected PERF_RECORD_ events are produced and will sanity check all its fields. Some checks performed: . PERF_SAMPLE_TIME monotonically increases. . PERF_SAMPLE_CPU is the one requested with sched_setaffinity . PERF_SAMPLE_TID and PERF_SAMPLE_PID matches the one we forked in perf_evlist__prepare_workload and that is stored in evlist->workload.pid . For the events where these fields are also present in its pre-sample_id_all fields (e.g. event->mmap.pid), that they are what is expected too. . That we get a bunch of mmaps: PATH/libcSUFFIX PATH/ldSUFFIX [vdso] PATH/sleep Example: [root@emilia ~]# taskset -c 3,4 perf test -v1 perf_sample 6: Validate PERF_RECORD_* events & perf_sample fields: --- start --- 7159480799825 3 PERF_RECORD_SAMPLE 7159480805584 3 PERF_RECORD_SAMPLE 7159480807814 3 PERF_RECORD_SAMPLE 7159480810430 3 PERF_RECORD_SAMPLE 7159480861511 3 PERF_RECORD_MMAP 8086/8086: [0x7fffffffd000(0x2000) @ 0x7fffffffd000]: //anon 7159481052516 3 PERF_RECORD_COMM: sleep:8086 7159481070188 3 PERF_RECORD_MMAP 8086/8086: [0x400000(0x6000) @ 0]: /bin/sleep 7159481077104 3 PERF_RECORD_MMAP 8086/8086: [0x3d06400000(0x221000) @ 0]: /lib64/ld-2.12.so 7159481092912 3 PERF_RECORD_MMAP 8086/8086: [0x7fff1adff000(0x1000) @ 0x7fff1adff000]: [vdso] 7159481196779 3 PERF_RECORD_MMAP 8086/8086: [0x3d06800000(0x37f000) @ 0]: /lib64/libc-2.12.so 7160481558435 3 PERF_RECORD_EXIT(8086:8086):(8086:8086) ---- end ---- Validate PERF_RECORD_* events & perf_sample fields: Ok [root@emilia ~]# Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-svag18v2z4idas0dyz3umjpq@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-12-02perf event: Introduce perf_event__fprintfArnaldo Carvalho de Melo2-6/+54
So that tools like 'perf test' can print the events when in verbose mode, for instance. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-xnovdqfi25nc48gy6604k7yp@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-12-01trace_events_filter: Use rcu_assign_pointer() when setting ↵Tejun Heo1-3/+3
ftrace_event_call->filter ftrace_event_call->filter is sched RCU protected but didn't use rcu_assign_pointer(). Use it. TODO: Add proper __rcu annotation to call->filter and all its users. -v2: Use RCU_INIT_POINTER() for %NULL clearing as suggested by Eric. Link: http://lkml.kernel.org/r/20111123164949.GA29639@google.com Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@kernel.org # (2.6.39+) Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-11-29perf test: Allow running just a subset of the available testsArnaldo Carvalho de Melo2-22/+67
To obtain a list of available tests: [root@emilia linux]# perf test list 1: vmlinux symtab matches kallsyms 2: detect open syscall event 3: detect open syscall event on all cpus 4: read samples using the mmap interface 5: parse events tests [root@emilia linux]# To list just a subset: [root@emilia linux]# perf test list syscall 2: detect open syscall event 3: detect open syscall event on all cpus [root@emilia linux]# To run a subset: [root@emilia linux]# perf test detect 2: detect open syscall event: Ok 3: detect open syscall event on all cpus: Ok [root@emilia linux]# Specific tests can be chosen by number: [root@emilia linux]# perf test 1 3 parse 1: vmlinux symtab matches kallsyms: Ok 3: detect open syscall event on all cpus: Ok 5: parse events tests: Ok [root@emilia linux]# Now to write more tests! Suggested-by: Peter Zijlstra <peterz@infradead.org> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-nqec2145qfxdgimux28aw7v8@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-11-29perf evlist: Always do automatic allocation of pollfd and mmap structuresArnaldo Carvalho de Melo5-27/+7
At first tools were required to do that, but while writing the python bindings to simplify the API I made them auto-allocate when needed. This just makes record, stat and top use that auto allocation, simplifying them a bit. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-iokhcvkzzijr3keioubx8hlq@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-11-28perf tools: Save some loops using perf_evlist__id2evselArnaldo Carvalho de Melo5-75/+136
Since we already ask for PERF_SAMPLE_ID and use it to quickly find the associated evsel, add handler func + data to struct perf_evsel to avoid using chains of if(strcmp(event_name)) and also to avoid all the linear list searches via trace_event_find. To demonstrate the technique convert 'perf sched' to it: # perf sched record sleep 5m And then: Performance counter stats for '/tmp/oldperf sched lat': 646.929438 task-clock # 0.999 CPUs utilized 9 context-switches # 0.000 M/sec 0 CPU-migrations # 0.000 M/sec 20,901 page-faults # 0.032 M/sec 1,290,144,450 cycles # 1.994 GHz <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 1,606,158,439 instructions # 1.24 insns per cycle 339,088,395 branches # 524.151 M/sec 4,550,735 branch-misses # 1.34% of all branches 0.647524759 seconds time elapsed Versus: Performance counter stats for 'perf sched lat': 473.564691 task-clock # 0.999 CPUs utilized 9 context-switches # 0.000 M/sec 0 CPU-migrations # 0.000 M/sec 20,903 page-faults # 0.044 M/sec 944,367,984 cycles # 1.994 GHz <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 1,442,385,571 instructions # 1.53 insns per cycle 308,383,106 branches # 651.195 M/sec 4,481,784 branch-misses # 1.45% of all branches 0.474215751 seconds time elapsed [root@emilia ~]# Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-1kbzpl74lwi6lavpqke2u2p3@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-11-28perf script: Add comm filtering optionDavid Ahern2-0/+17
Allows collecting events system wide and then pulling out events for a specific task name(s). e.g, perf script -c gnome-shell,gnome-terminal Applies on top of: https://lkml.org/lkml/2011/11/13/74 v2->v3 - update Documentation v1->v2 - use comm_list from symbol_conf Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1321894972-24246-1-git-send-email-dsahern@gmail.com Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-11-28perf tools: make -C consistent across commands (for cpu list arg)David Ahern6-8/+8
Currently the meaning of -C varies by perf command: for perf-top, perf-stat, perf-record it means cpu list. For perf-report it means comm list. Then perf-annotate, perf-report and perf-script use -c for cpu list. Fix annotate, report and script to use -C for cpu list to be consistent with top, stat and record. This means report needs to use -c for comm list which does introduce a backward compatibility change. v1 -> v2 - update perf-script.txt too Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1321209008-7004-1-git-send-email-dsahern@gmail.com Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-11-28perf top: Stop using globals for tool stateArnaldo Carvalho de Melo2-235/+245
Use its 'perf_tool' base class instead. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-i33q40wwvk2zna8fd36ex6sm@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-11-28perf tools: Rename perf_event_ops to perf_toolArnaldo Carvalho de Melo22-277/+301
To better reflect that it became the base class for all tools, that must be in each tool struct and where common stuff will be put. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-qgpc4msetqlwr8y2k7537cxe@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-11-28perf tools: Resolve machine earlier and pass it to perf_event_opsArnaldo Carvalho de Melo24-375/+376
Reducing the exposure of perf_session further, so that we can use the classes in cases where no perf.data file is created. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-stua66dcscsezzrcdugvbmvd@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-11-28perf tools: Pass tool context in the the perf_event_ops functionsArnaldo Carvalho de Melo20-373/+520
So that we don't need to have that many globals. Next steps will remove the 'session' pointer, that in most cases is not needed. Then we can rename perf_event_ops to 'perf_tool' that better describes this class hierarchy. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-wp4djox7x6w1i2bab1pt4xxp@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-11-28perf annotate: Group options in a structArnaldo Carvalho de Melo1-31/+33
Paving the way to remove these globals when we change the perf_event_ops to receive as a first parameter a pointer to a perf_event_ops that will then provide access to perf_annotate via container_of. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-xduzibqrdg3h5cttmk6p5wwc@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-11-28perf report: Group options in a structArnaldo Carvalho de Melo1-52/+59
Paving the way to remove these globals when we change the perf_event_ops to receive as a first parameter a pointer to a perf_event_ops that will then provide access to perf_report via container_of. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-2eh2vi2nb5z3tg1lvoxv09xu@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-11-28perf tools: Use evsel->attr.sample_type instead of session->sample_typeArnaldo Carvalho de Melo2-6/+8
Eventually session->sample_type will go away as we want to support multiple sample types per session, so use it from the evsel which is a step in that direction. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-0vwdpjcwbjezw459lw5n3ew1@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-11-28perf session: Remove superfluous callchain_cursor memberArnaldo Carvalho de Melo5-14/+14
Since we have it in evsel->hists.callchain_cursor, remove it from perf_session. One more step in disentangling several places from requiring a perf_session pointer. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-rxr5dj3di7ckyfmnz0naku1z@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-11-28perf event: perf_event_ops->attr() manipulates only an evlistArnaldo Carvalho de Melo5-17/+34
Removing another case where a perf_session is required when processing events. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-ug1wtjbnva4bxwknflkkrlrh@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-11-28perf evlist: Introduce id_hdr_size method out of perf_sessionArnaldo Carvalho de Melo3-28/+34
We will need this when not using perf_session in cases like 'perf top' and strace where no perf.data file is created nor consumed. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-za923wjc41q5xot5vrhuhj3j@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-11-28perf symbols: Add nr_events to symbol_confArnaldo Carvalho de Melo10-37/+30
Since symbol__alloc_hists need it, to avoid passing it around in many functions have it in the symbol_conf struct. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-cwv8ysvpywzjq4v3xtbd4zwv@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-11-28perf ui progress: Fix divide by zeroArnaldo Carvalho de Melo1-0/+3
Happens in a perf.data file where one of the events had no samples. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-j7st3oyiotvfxqde2nc41kxb@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-11-28perf record: Move 'group' to perf_event_opsArnaldo Carvalho de Melo2-5/+5
Will be used in other tools to share the command line parsing code. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-8x0yr77r6lrd2t699s499m8n@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>