summaryrefslogtreecommitdiffstats
path: root/tools/perf/bench
AgeCommit message (Collapse)AuthorFilesLines
2015-05-06Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2-9/+18
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Mostly tooling fixes, but also an uncore PMU driver fix and an uncore PMU driver hardware-enablement addition" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf probe: Fix segfault if passed with ''. perf report: Fix -T/--threads option to work again perf bench numa: Fix immediate meeting of convergence condition perf bench numa: Fixes of --quiet argument perf bench futex: Fix hung wakeup tasks after requeueing perf probe: Fix bug with global variables handling perf top: Fix a segfault when kernel map is restricted. tools lib traceevent: Fix build failure on 32-bit arch perf kmem: Fix compiles on RHEL6/OL6 tools lib api: Undefine _FORTIFY_SOURCE before setting it perf kmem: Consistently use PRIu64 for printing u64 values perf trace: Disable events and drain events when forked workload ends perf trace: Enable events when doing system wide tracing and starting a workload perf/x86/intel/uncore: Move PCI IDs for IMC to uncore driver perf/x86/intel/uncore: Add support for Intel Haswell ULT (lower power Mobile Processor) IMC uncore PMUs perf/x86/intel: Add cpu_(prepare|starting|dying) for core_pmu
2015-05-06Merge tag 'perf-core-for-mingo-3' of ↵Ingo Molnar1-1/+31
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: User visible changes: - Improve --filter support for 'perf probe', allowing using its arguments on other commands, as --add, --del, etc (Masami Hiramatsu) - Show warning when running 'perf kmem stat' on a unsuitable perf.data file, i.e. one with events that are not the ones required for the stat variant used (Namhyung Kim). Infrastructure changes: - Auxtrace support patches, paving the way to support Intel PT and BTS (Adrian Hunter) - hists browser (top, report) refactorings (Namhyung Kim) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-04perf bench numa: Show more stats of particular threads in verbose modePetr Holasek1-1/+31
In verbose mode perf bench numa shows also GB/s speed, system and user cpu time for each particular thread. Using of getrusage() can provide much more per process or per thread stats in future. Signed-off-by: Petr Holasek <pholasek@redhat.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1429198699-25039-3-git-send-email-pholasek@redhat.com [ Rename 'usage' variable to not shadow util.h's usage() ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-27perf bench numa: Fix immediate meeting of convergence conditionPetr Holasek1-0/+8
This patch fixes the race in the beginning of benchmark run when some threads hasn't got assigned curr_cpu yet so they don't occur in nodes-of-process stats and benchmark concludes that all remaining threads are converged already. The race can be reproduced with small amount of threads and some bigger amount of shared process memory, e.g. one process, two threads and 5GB of process memory. Signed-off-by: Petr Holasek <pholasek@redhat.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1429198699-25039-4-git-send-email-pholasek@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-27perf bench numa: Fixes of --quiet argumentPetr Holasek1-2/+2
Corrected description and fixed function of --quiet argument. Signed-off-by: Petr Holasek <pholasek@redhat.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1429198699-25039-2-git-send-email-pholasek@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-27perf bench futex: Fix hung wakeup tasks after requeueingDavidlohr Bueso1-7/+8
The futex-requeue benchmark can hang because of missing wakeups once the benchmark is done, ie: [Run 1]: Requeued 1024 of 1024 threads in 0.3290 ms perf: couldn't wakeup all tasks (135/1024) This bug, while perhaps suggesting missing wakeups in kernel futex code, is merely a consequence of the crappy FUTEX_CMP_REQUEUE man page, incorrectly mentioning that the number of requeued tasks is in fact returned, not the wakeups. This patch acknowledges this and updates the corresponding futex_wake code around it. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Mel Gorman <mgorman@suse.de> Link: http://lkml.kernel.org/r/1429894848.10273.44.camel@stgolabs.net Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-14Merge branch 'perf-core-for-linus' of ↵Linus Torvalds1-0/+11
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf changes from Ingo Molnar: "Core kernel changes: - One of the more interesting features in this cycle is the ability to attach eBPF programs (user-defined, sandboxed bytecode executed by the kernel) to kprobes. This allows user-defined instrumentation on a live kernel image that can never crash, hang or interfere with the kernel negatively. (Right now it's limited to root-only, but in the future we might allow unprivileged use as well.) (Alexei Starovoitov) - Another non-trivial feature is per event clockid support: this allows, amongst other things, the selection of different clock sources for event timestamps traced via perf. This feature is sought by people who'd like to merge perf generated events with external events that were measured with different clocks: - cluster wide profiling - for system wide tracing with user-space events, - JIT profiling events etc. Matching perf tooling support is added as well, available via the -k, --clockid <clockid> parameter to perf record et al. (Peter Zijlstra) Hardware enablement kernel changes: - x86 Intel Processor Trace (PT) support: which is a hardware tracer on steroids, available on Broadwell CPUs. The hardware trace stream is directly output into the user-space ring-buffer, using the 'AUX' data format extension that was added to the perf core to support hardware constraints such as the necessity to have the tracing buffer physically contiguous. This patch-set was developed for two years and this is the result. A simple way to make use of this is to use BTS tracing, the PT driver emulates BTS output - available via the 'intel_bts' PMU. More explicit PT specific tooling support is in the works as well - will probably be ready by 4.2. (Alexander Shishkin, Peter Zijlstra) - x86 Intel Cache QoS Monitoring (CQM) support: this is a hardware feature of Intel Xeon CPUs that allows the measurement and allocation/partitioning of caches to individual workloads. These kernel changes expose the measurement side as a new PMU driver, which exposes various QoS related PMU events. (The partitioning change is work in progress and is planned to be merged as a cgroup extension.) (Matt Fleming, Peter Zijlstra; CPU feature detection by Peter P Waskiewicz Jr) - x86 Intel Haswell LBR call stack support: this is a new Haswell feature that allows the hardware recording of call chains, plus tooling support. To activate this feature you have to enable it via the new 'lbr' call-graph recording option: perf record --call-graph lbr perf report or: perf top --call-graph lbr This hardware feature is a lot faster than stack walk or dwarf based unwinding, but has some limitations: - It reuses the current LBR facility, so LBR call stack and branch record can not be enabled at the same time. - It is only available for user-space callchains. (Yan, Zheng) - x86 Intel Broadwell CPU support and various event constraints and event table fixes for earlier models. (Andi Kleen) - x86 Intel HT CPUs event scheduling workarounds. This is a complex CPU bug affecting the SNB,IVB,HSW families that results in counter value corruption. The mitigation code is automatically enabled and is transparent. (Maria Dimakopoulou, Stephane Eranian) The perf tooling side had a ton of changes in this cycle as well, so I'm only able to list the user visible changes here, in addition to the tooling changes outlined above: User visible changes affecting all tools: - Improve support of compressed kernel modules (Jiri Olsa) - Save DSO loading errno to better report errors (Arnaldo Carvalho de Melo) - Bash completion for subcommands (Yunlong Song) - Add 'I' event modifier for perf_event_attr.exclude_idle bit (Jiri Olsa) - Support missing -f to override perf.data file ownership. (Yunlong Song) - Show the first event with an invalid filter (David Ahern, Arnaldo Carvalho de Melo) User visible changes in individual tools: 'perf data': New tool for converting perf.data to other formats, initially for the CTF (Common Trace Format) from LTTng (Jiri Olsa, Sebastian Siewior) 'perf diff': Add --kallsyms option (David Ahern) 'perf list': Allow listing events with 'tracepoint' prefix (Yunlong Song) Sort the output of the command (Yunlong Song) 'perf kmem': Respect -i option (Jiri Olsa) Print big numbers using thousands' group (Namhyung Kim) Allow -v option (Namhyung Kim) Fix alignment of slab result table (Namhyung Kim) 'perf probe': Support multiple probes on different binaries on the same command line (Masami Hiramatsu) Support unnamed union/structure members data collection. (Masami Hiramatsu) Check kprobes blacklist when adding new events. (Masami Hiramatsu) 'perf record': Teach 'perf record' about perf_event_attr.clockid (Peter Zijlstra) Support recording running/enabled time (Andi Kleen) 'perf sched': Improve the performance of 'perf sched replay' on high CPU core count machines (Yunlong Song) 'perf report' and 'perf top': Allow annotating entries in callchains in the hists browser (Arnaldo Carvalho de Melo) Indicate which callchain entries are annotated in the TUI hists browser (Arnaldo Carvalho de Melo) Add pid/tid filtering to 'report' and 'script' commands (David Ahern) Consider PERF_RECORD_ events with cpumode == 0 in 'perf top', removing one cause of long term memory usage buildup, i.e. not processing PERF_RECORD_EXIT events (Arnaldo Carvalho de Melo) 'perf stat': Report unsupported events properly (Suzuki K. Poulose) Output running time and run/enabled ratio in CSV mode (Andi Kleen) 'perf trace': Handle legacy syscalls tracepoints (David Ahern, Arnaldo Carvalho de Melo) Only insert blank duration bracket when tracing syscalls (Arnaldo Carvalho de Melo) Filter out the trace pid when no threads are specified (Arnaldo Carvalho de Melo) Dump stack on segfaults (Arnaldo Carvalho de Melo) No need to explicitely enable evsels for workload started from perf, let it be enabled via perf_event_attr.enable_on_exec, removing some events that take place in the 'perf trace' before a workload is really started by it. (Arnaldo Carvalho de Melo) Allow mixing with tracepoints and suppressing plain syscalls. (Arnaldo Carvalho de Melo) There's also been a ton of infrastructure work done, such as the split-out of perf's build system into tools/build/ and other changes - see the shortlog and changelog for details" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (358 commits) perf/x86/intel/pt: Clean up the control flow in pt_pmu_hw_init() perf evlist: Fix type for references to data_head/tail perf probe: Check the orphaned -x option perf probe: Support multiple probes on different binaries perf buildid-list: Fix segfault when show DSOs with hits perf tools: Fix cross-endian analysis perf tools: Fix error path to do closedir() when synthesizing threads perf tools: Fix synthesizing fork_event.ppid for non-main thread perf tools: Add 'I' event modifier for exclude_idle bit perf report: Don't call map__kmap if map is NULL. perf tests: Fix attr tests perf probe: Fix ARM 32 building error perf tools: Merge all perf_event_attr print functions perf record: Add clockid parameter perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10 perf sched replay: Support using -f to override perf.data file ownership perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task perf sched replay: Fix the segmentation fault problem caused by pr_err in threads perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations ...
2015-03-04Merge tag 'alternatives_padding' of ↵Ingo Molnar5-72/+72
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/asm Pull alternative instructions framework improvements from Borislav Petkov: "A more involved rework of the alternatives framework to be able to pad instructions and thus make using the alternatives macros more straightforward and without having to figure out old and new instruction sizes but have the toolchain figure that out for us. Furthermore, it optimizes JMPs used so that fetch and decode can be relieved with smaller versions of the JMPs, where possible. Some stats: x86_64 defconfig: Alternatives sites total: 2478 Total padding added (in Bytes): 6051 The padding is currently done for: X86_FEATURE_ALWAYS X86_FEATURE_ERMS X86_FEATURE_LFENCE_RDTSC X86_FEATURE_MFENCE_RDTSC X86_FEATURE_SMAP This is with the latest version of the patchset. Of course, on each machine the alternatives sites actually being patched are a proper subset of the total number." Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-03perf/bench: Add -r all so that you can run all mem* routinesBorislav Petkov1-1/+9
perf bench mem mem{set,cpy} -r all thus runs all available mem benchmarking routines. Reviewed-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-03perf/bench: Carve out mem routine benchmarkingBorislav Petkov1-62/+58
... so that we can call it multiple times. See next patch. Reviewed-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-03perf/bench: Fix mem* routines usage after alternatives changeBorislav Petkov4-10/+6
Adjust perf bench to the new changes in the alternatives code for memcpy/memset. Reviewed-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-02Merge 'tip/perf/urgent' into perf/core to pick fixesArnaldo Carvalho de Melo1-2/+2
Needed to build perf/core buildable in some cases. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-22perf bench: Fix order of arguments to memcpy_alloc_memBruce Merry1-2/+2
This was causing the destination instead of the source to be filled. As a result, the source was typically all mapped to one zero page, and hence very cacheable. Signed-off-by: Bruce Merry <bmerry@ska.ac.za> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150115092022.GA11292@kryton Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-12perf build: Add bench objects buildingJiri Olsa1-0/+11
Move bench objects building under build framework and enable perf-in.o rule. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Tested-by: Will Deacon <will.deacon@arm.com> Cc: Alexis Berlemont <alexis.berlemont@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-b0gxubmn3qjabaq0lune53y3@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-01-28perf tools: Provide stub for missing pthread_attr_setaffinity_npVineet Gupta1-0/+13
uClibc Linuxthreads.old doesn't support the pthread_attr_setaffinity_np() functioo: ----------------->8----------------------- CC bench/futex-hash.o CC bench/futex-wake.o bench/futex-hash.c: In function 'bench_futex_hash': bench/futex-hash.c:161:3: error: implicit declaration of function 'pthread_attr_setaffinity_np' [-Werror=implicit-function-declaration] ret = pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpu); ^ bench/futex-hash.c:161:3: error: nested extern declaration of 'pthread_attr_setaffinity_np' [-Werror=nested-externs] ----------------->8----------------------- So introduce a test to check that and if not available provide a stub. Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexey Brodkin <Alexey.Brodkin@synopsys.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1421156604-30603-6-git-send-email-vgupta@synopsys.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-01-16perf tools: Avoid build splat for syscall numbers with uclibcVineet Gupta1-1/+1
This is due to duplicated unistd inclusion (via uClibc headers + kernel headers) Also seen on ARM uClibc based tools ------- ARC build ---------->8------------- CC util/evlist.o In file included from ~/arc/k.org/arch/arc/include/uapi/asm/unistd.h:25:0, from util/../perf-sys.h:10, from util/../perf.h:15, from util/event.h:7, from util/event.c:3: ~/arc/k.org/include/uapi/asm-generic/unistd.h:906:0: warning: "__NR_fcntl64" redefined [enabled by default] #define __NR_fcntl64 __NR3264_fcntl ^ In file included from ~/arc/gnu/INSTALL_1412-arc-2014.12-rc1/arc-snps-linux-uclibc/sysroot/usr/include/sys/syscall.h:24:0, from util/../perf-sys.h:6, ----------------->8------------------- ------- ARM build ---------->8------------- CC FPIC plugin_scsi.o In file included from util/../perf-sys.h:9:0, from util/../perf.h:15, from util/cache.h:7, from perf.c:12: ~/arc/k.org/arch/arm/include/uapi/asm/unistd.h:28:0: warning: "__NR_restart_syscall" redefined [enabled by default] In file included from ~/buildroot/host/usr/arm-buildroot-linux-uclibcgnueabi/sysroot/usr/include/sys/syscall.h:25:0, from util/../perf-sys.h:6, from util/../perf.h:15, from util/cache.h:7, from perf.c:12: ~/buildroot/host/usr/arm-buildroot-linux-uclibcgnueabi/sysroot/usr/include/bits/sysnum.h:17:0: note: this is the location of the previous definition ----------------->8------------------- Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Cc: Alexey Brodkin <Alexey.Brodkin@synopsys.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1421156604-30603-4-git-send-email-vgupta@synopsys.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-12-09perf bench: Fix memcpy/memset outputRabin Vincent1-8/+10
The memcpy and memset benchmarks return bogus results when iterations > 0 because the iterations value is not taken into account when calculating the final result: $ perf bench mem memset --only-prefault --length 1GB --iterations 1 # Running 'mem/memset' benchmark: # Copying 1GB Bytes ... 20.798669 GB/Sec (with prefault) $ perf bench mem memset --only-prefault --length 1GB --iterations 10 # Running 'mem/memset' benchmark: # Copying 1GB Bytes ... 2.086576 GB/Sec (with prefault) $ perf bench mem memset --only-prefault --length 1GB --iterations 100 # Running 'mem/memset' benchmark: # Copying 1GB Bytes ... 212.840917 MB/Sec (with prefault) Fix this. Signed-off-by: Rabin Vincent <rabin.vincent@axis.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Rabin Vincent <rabin@rab.in> Cc: Rabin Vincent <rabinv@axis.com> Link: http://lkml.kernel.org/r/1417535441-3965-3-git-send-email-rabin.vincent@axis.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-12-09perf bench: Merge memset into memcpyRabin Vincent2-304/+90
The memset benchmark is largely copy-pasted from the memcpy benchmark. Merge the two now that memcpy is made more generic. Signed-off-by: Rabin Vincent <rabin.vincent@axis.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Rabin Vincent <rabinv@axis.com> Link: http://lkml.kernel.org/r/1417535441-3965-2-git-send-email-rabin.vincent@axis.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-12-09perf bench: Prepare memcpy for mergeRabin Vincent1-78/+104
The memset benchmark is largely copy-pasted from the memcpy benchmark. Prepare the memcpy file for merge with memset by extracting out a generic function. Signed-off-by: Rabin Vincent <rabin.vincent@axis.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Rabin Vincent <rabinv@axis.com> Link: http://lkml.kernel.org/r/1417535441-3965-1-git-send-email-rabin.vincent@axis.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-09-29perf bench futex: Sanitize -q option in requeueDavidlohr Bueso1-1/+3
When given the number of threads to requeue at once by user input, there's always the risk of this value being larger than the total number of threads. This doesn't make any sense, and the kernel can easily deal with such sort of situations, hence no big deal. We should however prevent bogus output such as: ./perf bench --repeat 2 futex requeue -q 10 Run summary [PID 22210]: Requeuing 4 threads (from [private] 0x99ef3c to 0x99ef38), 10 at a time. [Run 1]: Requeued 10 of 4 threads in 0.0040 ms [Run 2]: Requeued 10 of 4 threads in 0.0030 ms Requeued 10 of 4 threads in 0.0035 ms (+-14.29%) Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Davidlohr Bueso <dbueso@suse.de> Link: http://lkml.kernel.org/r/1412008868-22328-2-git-send-email-dave@stgolabs.net Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-09-29perf bench futex: Support operations for shared futexesDavidlohr Bueso3-16/+30
Unlike futex-hash, requeuing and wakeup benchmarks do not support shared futexes, limiting the usefulness of the programs. Correct this, and allow using the local -S parameter. The default remains using private futexes. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Davidlohr Bueso <dbueso@suse.de> Link: http://lkml.kernel.org/r/1412008868-22328-1-git-send-email-dave@stgolabs.net Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-09-17perf tools: Don't include sys/poll.h directlyArnaldo Carvalho de Melo1-1/+1
Include poll.h instead. Fixes the following warning in systems with musl's libc: /usr/include/sys/poll.h:1:2: warning: #warning redirecting incorrect #include <sys/poll.h> to <poll.h> [-Wcpp] Reported-by: John Spencer <maillist-linux@barfooze.de> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://thread.gmane.org/gmane.linux.kernel.perf.user/1687/focus=1690 Link: http://lkml.kernel.org/n/tip-k4ocrq1de3fk146oevy346bi@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-07-18perf tools: Enable close-on-exec flag on perf file descriptorYann Droneaud2-2/+6
In commit a21b0b354d4a ('perf: Introduce a flag to enable close-on-exec in perf_event_open()'), flag PERF_FLAG_FD_CLOEXEC was added to perf_event_open(2) syscall to allows userspace to atomically enable close-on-exec behavor when creating the file descriptor. This patch makes perf tools use the new flag if supported by the kernel, so that the event file descriptors got automatically closed if perf tool exec a sub-command. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/1404160127-7475-1-git-send-email-ydroneaud@opteya.com Signed-off-by: Jiri Olsa <jolsa@kernel.org>
2014-06-19perf bench sched-messaging: Drop barf()Davidlohr Bueso1-26/+19
Instead of reinventing the wheel, we can use err(2) when dealing with fatal errors. Exit code is now always EXIT_FAILURE (1). Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/1402942467-10671-9-git-send-email-davidlohr@hp.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-06-19perf bench mem: The -o and -n options are mutually exclusiveDavidlohr Bueso2-0/+10
-o, --only-prefault Show only the result with page faults before mem* -n, --no-prefault Show only the result without page faults before mem* Makes no sense to call together. Applies to both memset and memcpy. Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/1402942467-10671-8-git-send-email-davidlohr@hp.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-06-19perf bench futex: Use global --repeat optionDavidlohr Bueso2-19/+3
This option is available through perf-bench, use it instead and free the local option. Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/1402942467-10671-6-git-send-email-davidlohr@hp.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-06-19perf bench: Add --repeat optionDavidlohr Bueso1-0/+1
There are a number of benchmarks that do single runs and as a result does not really help users gain a general idea of how the workload performs. So the user must either manually do multiple runs or just use single bogus results. This option will enable users to specify the amount of runs (arbitrarily defaulted to 10, to use the existing benchmarks default) through the '--repeat' option. Add it to perf-bench instead of implementing it always in each specific benchmark. Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/1402942467-10671-2-git-send-email-davidlohr@hp.com [ Kept the existing default of 10, changing it to something else should be done on separate patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-06-19perf bench sched-messaging: Plug memleakDavidlohr Bueso1-0/+2
Explicitly free the thread array ('pth_tab'). Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/1402942467-10671-5-git-send-email-davidlohr@hp.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-04-14Merge branch 'perf-core-for-mingo' into perf/urgentIngo Molnar1-0/+4
Conflicts: tools/perf/bench/numa.c Pull perf fixes from Jiri Olsa. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-14perf bench: Set more defaults in the 'numa' suiteRamkumar Ramachandra1-0/+4
Currently, $ perf bench numa mem errors out with usage information. To make this more user-friendly, let us provide a minimum set of default values required for a test run. As an added bonus, $ perf bench all now goes all the way to completion. Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1395964219-22173-2-git-send-email-artagnon@gmail.com Signed-off-by: Jiri Olsa <jolsa@redhat.com>
2014-03-31Merge branch 'perf-core-for-linus' of ↵Linus Torvalds5-0/+698
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf changes from Ingo Molnar: "Main changes: Kernel side changes: - Add SNB/IVB/HSW client uncore memory controller support (Stephane Eranian) - Fix various x86/P4 PMU driver bugs (Don Zickus) Tooling, user visible changes: - Add several futex 'perf bench' microbenchmarks (Davidlohr Bueso) - Speed up thread map generation (Don Zickus) - Introduce 'perf kvm --list-cmds' command line option for use by scripts (Ramkumar Ramachandra) - Print the evsel name in the annotate stdio output, prep to fix support outputting annotation for multiple events, not just for the first one (Arnaldo Carvalho de Melo) - Allow setting preferred callchain method in .perfconfig (Jiri Olsa) - Show in what binaries/modules 'perf probe's are set (Masami Hiramatsu) - Support distro-style debuginfo for uprobe in 'perf probe' (Masami Hiramatsu) Tooling, internal changes and fixes: - Use tid in mmap/mmap2 events to find maps (Don Zickus) - Record the reason for filtering an address_location (Namhyung Kim) - Apply all filters to an addr_location (Namhyung Kim) - Merge al->filtered with hist_entry->filtered in report/hists (Namhyung Kim) - Fix memory leak when synthesizing thread records (Namhyung Kim) - Use ui__has_annotation() in 'report' (Namhyung Kim) - hists browser refactorings to reuse code accross UIs (Namhyung Kim) - Add support for the new DWARF unwinder library in elfutils (Jiri Olsa) - Fix build race in the generation of bison files (Jiri Olsa) - Further streamline the feature detection display, trimming it a bit to show just the libraries detected, using VF=1 gets a more verbose output, showing the less interesting feature checks as well (Jiri Olsa). - Check compatible symtab type before loading dso (Namhyung Kim) - Check return value of filename__read_debuglink() (Stephane Eranian) - Move some hashing and fs related code from tools/perf/util/ to tools/lib/ so that it can be used by more tools/ living utilities (Borislav Petkov) - Prepare DWARF unwinding code for using an elfutils alternative unwinding library (Jiri Olsa) - Fix DWARF unwind max_stack processing (Jiri Olsa) - Add dwarf unwind 'perf test' entry (Jiri Olsa) - 'perf probe' improvements including memory leak fixes, sharing the intlist class with other tools, uprobes/kprobes code sharing and use of ref_reloc_sym (Masami Hiramatsu) - Shorten sample symbol resolving by adding cpumode to struct addr_location (Arnaldo Carvalho de Melo) - Fix synthesizing mmaps for threads (Don Zickus) - Fix invalid output on event group stdio report (Namhyung Kim) - Fixup header alignment in 'perf sched latency' output (Ramkumar Ramachandra) - Fix off-by-one error in 'perf timechart record' argv handling (Ramkumar Ramachandra) Tooling, cleanups: - Remove unused thread__find_map function (Jiri Olsa) - Remove unused simple_strtoul() function (Ramkumar Ramachandra) Tooling, documentation updates: - Update function names in debug messages (Ramkumar Ramachandra) - Update some code references in design.txt (Ramkumar Ramachandra) - Clarify load-latency information in the 'perf mem' docs (Andi Kleen) - Clarify x86 register naming in 'perf probe' docs (Andi Kleen)" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (96 commits) perf tools: Remove unused simple_strtoul() function perf tools: Update some code references in design.txt perf evsel: Update function names in debug messages perf tools: Remove thread__find_map function perf annotate: Print the evsel name in the stdio output perf report: Use ui__has_annotation() perf tools: Fix memory leak when synthesizing thread records perf tools: Use tid in mmap/mmap2 events to find maps perf report: Merge al->filtered with hist_entry->filtered perf symbols: Apply all filters to an addr_location perf symbols: Record the reason for filtering an address_location perf sched: Fixup header alignment in 'latency' output perf timechart: Fix off-by-one error in 'record' argv handling perf machine: Factor machine__find_thread to take tid argument perf tools: Speed up thread map generation perf kvm: introduce --list-cmds for use by scripts perf ui hists: Pass evsel to hpp->header/width functions explicitly perf symbols: Introduce thread__find_cpumode_addr_location perf session: Change header.misc dump from decimal to hex perf ui/tui: Reuse generic __hpp__fmt() code ...
2014-03-14perf bench: Add futex-requeue microbenchmarkDavidlohr Bueso3-0/+225
Block a bunch of threads on a futex and requeue them on another, N at a time. This program is particularly useful to measure the latency of nthread requeues without waking up any tasks -- thus mimicking a regular futex_wait. An example run: $ perf bench futex requeue -r 100 -t 64 Run summary [PID 151011]: Requeuing 64 threads (from 0x7d15c4 to 0x7d15c8), 1 at a time. [Run 1]: Requeued 64 of 64 threads in 0.0400 ms [Run 2]: Requeued 64 of 64 threads in 0.0390 ms [Run 3]: Requeued 64 of 64 threads in 0.0400 ms ... [Run 100]: Requeued 64 of 64 threads in 0.0390 ms Requeued 64 of 64 threads in 0.0399 ms (+-0.37%) Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Acked-by: Darren Hart <dvhart@linux.intel.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Darren Hart <dvhart@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Low <jason.low2@hp.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <Waiman.Long@hp.com> Link: http://lkml.kernel.org/r/1387081917-9102-4-git-send-email-davidlohr@hp.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-03-14perf bench: Add futex-wake microbenchmarkDavidlohr Bueso3-0/+212
Block a bunch of threads on a futex and wake them up, N at a time. This program is particularly useful to measure the latency of nthread wakeups in non-error situations: all waiters are queued and all wake calls wakeup one or more tasks. An example run: $ perf bench futex wake -t 512 -r 100 Run summary [PID 27823]: blocking on 512 threads (at futex 0x7e10d4), waking up 1 at a time. [Run 1]: Wokeup 512 of 512 threads in 6.0080 ms [Run 2]: Wokeup 512 of 512 threads in 5.2280 ms [Run 3]: Wokeup 512 of 512 threads in 4.8300 ms ... [Run 100]: Wokeup 512 of 512 threads in 5.0100 ms Wokeup 512 of 512 threads in 5.0109 ms (+-2.25%) Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Acked-by: Darren Hart <dvhart@linux.intel.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Darren Hart <dvhart@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Low <jason.low2@hp.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <Waiman.Long@hp.com> Link: http://lkml.kernel.org/r/1387081917-9102-3-git-send-email-davidlohr@hp.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-03-14perf bench: Add futex-hash microbenchmarkDavidlohr Bueso3-0/+261
Introduce futexes to perf-bench and add a program that stresses and measures the kernel's implementation of the hash table. This is a multi-threaded program that simply measures the amount of failed futex wait calls - we only want to deal with the hashing overhead, so a negative return of futex_wait_setup() is enough to do the trick. An example run: $ perf bench futex hash -t 32 Run summary [PID 10989]: 32 threads, each operating on 1024 [private] futexes for 10 secs. [thread 0] futexes: 0x19d9b10 ... 0x19dab0c [ 418713 ops/sec ] [thread 1] futexes: 0x19daca0 ... 0x19dbc9c [ 469913 ops/sec ] [thread 2] futexes: 0x19dbe30 ... 0x19dce2c [ 479744 ops/sec ] ... [thread 31] futexes: 0x19fbb80 ... 0x19fcb7c [ 464179 ops/sec ] Averaged 454310 operations/sec (+- 0.84%), total secs = 10 Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Acked-by: Darren Hart <dvhart@linux.intel.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Darren Hart <dvhart@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Low <jason.low2@hp.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <Waiman.Long@hp.com> Link: http://lkml.kernel.org/r/1387081917-9102-2-git-send-email-davidlohr@hp.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-03-14perf bench numa: Make no args mean 'run all tests'Arnaldo Carvalho de Melo1-0/+1
If we call just: perf bench numa mem it will present the same output as: perf bench numa mem -h i.e. ask for instructions about what to run. While that is kinda ok, using 'run all tests' as the default, i.e. making 'no parms' be equivalent to: perf bench numa mem -a Will allow: perf bench numa all to actually do what is asked: i.e. run all the 'bench' tests, instead of responding to that by asking what to do. That, in turn, allows: perf bench all to actually complete, for the same reasons. And after that, the tests that come after that, and that at some point hit a NULL deref, will run, allowing me to reproduce a recently reported problem. That when you have the needed numa libraries, which wasn't the case for the reporter, making me a bit confused after trying to reproduce his report. So make no parms mean -a. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Patrick Palka <patrick@parcs.ath.cx> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-x7h0ghx4pef4n0brywg21krk@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-10-21perf tools: Fix bench/numa.c for 32-bit buildAdrian Hunter1-2/+2
bench/numa.c: In function 'worker_thread': bench/numa.c:1123:20: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare] bench/numa.c:1171:6: error: format '%lx' expects argument of type 'long unsigned int', but argument 5 has type 'u64' [-Werror=format] cc1: all warnings being treated as errors Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1382099356-4918-13-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-10-11perf bench: Fix failing assertions in numa benchPetr Holasek1-13/+21
Patch adds more subtle handling of -C and -N parameters in parse_{cpu,node}_setup_list() functions when there isn't enough NUMA nodes or CPUs present. Instead of assertion and terminating benchmark, partial test is skipped with error message and perf will continue to the next one. Fixed problem can be easily reproduced on machine with only one NUMA node: # Running numa/mem benchmark... # Running main, "perf bench numa mem -a" ... # Running RAM-bw-remote, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0 -M 1 -s perf: bench/numa.c:622: parse_setup_node_list: Assertion `!(bind_node_0 < 0 || bind_node_0 >= g->p.nr_nodes)' failed. Aborted Signed-off-by: Petr Holasek <pholasek@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Petr Benas <pbenas@redhat.com> Link: http://lkml.kernel.org/r/1380821325-4017-1-git-send-email-pholasek@redhat.com Signed-off-by: Petr Benas <pbenas@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-10-11perf bench sched: Add --threaded optionIngo Molnar1-29/+86
Allow the measurement of thread versus process context switch performance. The default stays at 'process' based measurement, like lmbench's lat_ctx benchmark. Sample output: comet:~/tip/tools/perf> taskset 1 ./perf bench sched pipe # Running sched/pipe benchmark... # Executed 1000000 pipe operations between two processes Total time: 4.138 [sec] 4.138729 usecs/op 241620 ops/sec comet:~/tip/tools/perf> taskset 1 ./perf bench sched pipe --threaded # Running sched/pipe benchmark... # Executed 1000000 pipe operations between two threads Total time: 3.667 [sec] 3.667667 usecs/op 272652 ops/sec Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Link: http://lkml.kernel.org/r/20130917114256.GA31159@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-10-09tools/perf: Standardize feature support define names to: HAVE_{FEATURE}_SUPPORTIngo Molnar4-4/+4
Standardize all the feature flags based on the HAVE_{FEATURE}_SUPPORT naming convention: HAVE_ARCH_X86_64_SUPPORT HAVE_BACKTRACE_SUPPORT HAVE_CPLUS_DEMANGLE_SUPPORT HAVE_DWARF_SUPPORT HAVE_ELF_GETPHDRNUM_SUPPORT HAVE_GTK2_SUPPORT HAVE_GTK_INFO_BAR_SUPPORT HAVE_LIBAUDIT_SUPPORT HAVE_LIBELF_MMAP_SUPPORT HAVE_LIBELF_SUPPORT HAVE_LIBNUMA_SUPPORT HAVE_LIBUNWIND_SUPPORT HAVE_ON_EXIT_SUPPORT HAVE_PERF_REGS_SUPPORT HAVE_SLANG_SUPPORT HAVE_STRLCPY_SUPPORT Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/n/tip-u3zvqejddfZhtrbYbfhi3spa@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-22perf bench: Fix memcpy benchmark for large sizesAndi Kleen1-0/+2
The glibc calloc() function has an optimization to not explicitely memset() very large calloc allocations that just came from mmap(), because they are known to be zero. This could result in the perf memcpy benchmark reading only from the zero page, which gives unrealistic results. Always call memset explicitly on the source area to avoid this problem. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Link: http://lkml.kernel.org/n/tip-pzz2qrdq9eymxda0y8yxdn33@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-07-08perf bench: Fix memory allocation fail check in mem{set,cpy} workloadsKirill A. Shutemov2-3/+3
Addresses of allocated memory areas saved to '*src' and '*dst', so we need to check them for NULL, not 'src' and 'dst'. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1370518503-4230-1-git-send-email-kirill.shutemov@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-03-14perf tools: Fix LIBNUMA build with glibc 2.12 and older.Vinson Lee1-0/+24
The tokens MADV_HUGEPAGE and MADV_NOHUGEPAGE are not available with glibc 2.12 and older. Define these tokens if they are not already defined. This patch fixes these build errors with older versions of glibc. CC bench/numa.o bench/numa.c: In function ‘alloc_data’: bench/numa.c:334: error: ‘MADV_HUGEPAGE’ undeclared (first use in this function) bench/numa.c:334: error: (Each undeclared identifier is reported only once bench/numa.c:334: error: for each function it appears in.) bench/numa.c:341: error: ‘MADV_NOHUGEPAGE’ undeclared (first use in this function) make: *** [bench/numa.o] Error 1 Signed-off-by: Vinson Lee <vlee@twitter.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Irina Tirdea <irina.tirdea@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1363214064-4671-2-git-send-email-vlee@twitter.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-30perf: Add 'perf bench numa mem' NUMA performance measurement suiteIngo Molnar2-0/+1732
Add a suite of NUMA performance benchmarks. The goal was simulate the behavior and access patterns of real NUMA workloads, via a wide range of parameters, so this tool goes well beyond simple bzero() measurements that most NUMA micro-benchmarks use: - It processes the data and creates a chain of data dependencies, like a real workload would. Neither the compiler, nor the kernel (via KSM and other optimizations) nor the CPU can eliminate parts of the workload. - It randomizes the initial state and also randomizes the target addresses of the processing - it's not a simple forward scan of addresses. - It provides flexible options to set process, thread and memory relationship information: -G sets "global" memory shared between all test processes, -P sets "process" memory shared by all threads of a process and -T sets "thread" private memory. - There's a NUMA convergence monitoring and convergence latency measurement option via -c and -m. - Micro-sleeps and synchronization can be injected to provoke lock contention and scheduling, via the -u and -S options. This simulates IO and contention. - The -x option instructs the workload to 'perturb' itself artificially every N seconds, by moving to the first and last CPU of the system periodically. This way the stability of convergence equilibrium and the number of steps taken for the scheduler to reach equilibrium again can be measured. - The amount of work can be specified via the -l loop count, and/or via a -s seconds-timeout value. - CPU and node memory binding options, to test hard binding scenarios. THP can be turned on and off via madvise() calls. - Live reporting of convergence progress in an 'at glance' output format. Printing of convergence and deconvergence events. The 'perf bench numa mem -a' option will start an array of about 30 individual tests that will each output such measurements: # Running 5x5-bw-thread, "perf bench numa mem -p 5 -t 5 -P 512 -s 20 -zZ0q --thp 1" 5x5-bw-thread, 20.276, secs, runtime-max/thread 5x5-bw-thread, 20.004, secs, runtime-min/thread 5x5-bw-thread, 20.155, secs, runtime-avg/thread 5x5-bw-thread, 0.671, %, spread-runtime/thread 5x5-bw-thread, 21.153, GB, data/thread 5x5-bw-thread, 528.818, GB, data-total 5x5-bw-thread, 0.959, nsecs, runtime/byte/thread 5x5-bw-thread, 1.043, GB/sec, thread-speed 5x5-bw-thread, 26.081, GB/sec, total-speed See the help text and the code for more details. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-11perf tools: Use __maybe_used for unused variablesIrina Tirdea5-7/+8
perf defines both __used and __unused variables to use for marking unused variables. The variable __used is defined to __attribute__((__unused__)), which contradicts the kernel definition to __attribute__((__used__)) for new gcc versions. On Android, __used is also defined in system headers and this leads to warnings like: warning: '__used__' attribute ignored __unused is not defined in the kernel and is not a standard definition. If __unused is included everywhere instead of __used, this leads to conflicts with glibc headers, since glibc has a variables with this name in its headers. The best approach is to use __maybe_unused, the definition used in the kernel for __attribute__((unused)). In this way there is only one definition in perf sources (instead of 2 definitions that point to the same thing: __used and __unused) and it works on both Linux and Android. This patch simply replaces all instances of __used and __unused with __maybe_unused. Signed-off-by: Irina Tirdea <irina.tirdea@intel.com> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1347315303-29906-7-git-send-email-irina.tirdea@intel.com [ committer note: fixed up conflict with a116e05 in builtin-sched.c ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-08perf bench: fix assert when NDEBUG is definedIrina Tirdea1-3/+3
When NDEBUG is defined, the assert macro will be expanded to nothing. Some assert calls used in perf are also including some functionality (e.g. system calls), not only validity checks. Therefore, if NDEBUG is defined, this functionality will be removed along with the assert. Perf also defines BUG_ON based on assert, so it has the same problem. Define BUG_ON so that the condition will be executed when NDEBUG is defined. Replace the assert statements that have these side effects with BUG_ON. For defining BUG_ON, use "if (cond) {}" insted of "if (cond) ;" because in the latter case build fails with "error: suggest braces around empty body in an ‘if’ statement [-Werror=empty-body]" Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Irina Tirdea <irina.tirdea@intel.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Pekka Enberg <penberg@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1347082551-2394-1-git-send-email-irina.tirdea@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-07-02perf bench: Fix confused variable namings and descriptions in mem subsystemHitoshi Mitake2-80/+80
As Namhyung Kim pointed, there are confused namings and descriptions of words "cycle" and "clock" in mem-memset.c and mem-memcpy.c. With the option "-c" (or "--clock", now renamed as "--cycle"), mem subsystem measures cost of memset() and memcpy() with cpu-cycles event. But current mem subsystem source code contains lots of confused variable namings and descriptions with "clock" (e.g. the variable use_clock). This is a very bad style because there is another software event named "cpu-clock". This patch replaces wrong usage of "clock" to "cycle". v2: modified Documentation/perf-bench.txt for the descriptions of --cycle option Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1341236777-18457-1-git-send-email-h.mitake@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-06-27perf bench: Documentation updateNamhyung Kim2-6/+6
The current perf-bench documentation has a couple of typos and even lacks entire description of mem subsystem. Fix it. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1340172486-17805-1-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-02-06perf tool: Fix perf stack to non executable on x86_64Jiri Olsa1-0/+7
By adding following objects: bench/mem-memset-x86-64-asm.o bench/mem-memcpy-x86-64-asm.o the x86_64 perf binary ended up with executable stack. The reason was that above objects are assembler sourced and are missing the GNU-stack note section. In such case the linker assumes that the final binary should not be restricted at all and mark the stack as RWX. Adding section ".note.GNU-stack" definition to mentioned objects, with all flags disabled, thus omiting those objects from linker stack flags decision. Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=783570 Reported-by: Clark Williams <williams@redhat.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1328100848-5630-1-git-send-email-jolsa@redhat.com Signed-off-by: Jiri Olsa <jolsa@redhat.com> [ committer note: Remaining bits after what was already added to perf/urgent ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-02-06Merge branch 'perf/urgent' into perf/coreArnaldo Carvalho de Melo1-0/+6
So that we can get the perf bench exec stack fixes and then apply the remaining fix for the files added after what is in perf/urgent. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-02-06perf tools: Fix perf stack to non executable on x86_64Jiri Olsa1-0/+6
By adding following objects: bench/mem-memcpy-x86-64-asm.o the x86_64 perf binary ended up with executable stack. The reason was that above object are assembler sourced and is missing the GNU-stack note section. In such case the linker assumes that the final binary should not be restricted at all and mark the stack as RWX. Adding section ".note.GNU-stack" definition to mentioned object, with all flags disabled, thus omiting this object from linker stack flags decision. Problem introduced in: $ git describe ea7872b v2.6.37-rc2-19-gea7872b Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=783570 Reported-by: Clark Williams <williams@redhat.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stable@kernel.org Link: http://lkml.kernel.org/r/1328100848-5630-1-git-send-email-jolsa@redhat.com Signed-off-by: Jiri Olsa <jolsa@redhat.com> [ committer note: Backported fix to perf/urgent (3.3-rc2+) ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>