linux - Linux Kernel (branches are rebased on master from time to time)

Age	Commit message (Collapse)	Author	Files	Lines
2016-04-28	perf probe: Use strbuf for making strings	Masami Hiramatsu	3	-169/+93
	Replace many fixed-length char array with strbuf to stringify perf_probe_event and probe_trace_event etc. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20160427183713.23446.97377.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-28	perf evsel: Remove two extraneous ending newlines in open_strerror()	Arnaldo Carvalho de Melo	1	-2/+2
	The error messages returned by this method should not have an ending newline, fix the two cases where it was. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-8af0pazzhzl3dluuh8p7ar7p@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-28	perf evsel: Handle ENOMEM for perf_event_max_stack + PERF_SAMPLE_CALLCHAIN	Arnaldo Carvalho de Melo	1	-0/+8
	When the kernel allows tweaking perf_event_max_stack and the event being setup has PERF_SAMPLE_CALLCHAIN in its perf_event_attr.sample_type, tell the user that tweaking /proc/sys/kernel/perf_event_max_stack may solve the problem. Before: # echo 32000 > /proc/sys/kernel/perf_event_max_stack # perf record -g usleep 1 Error: The sys_perf_event_open() syscall returned with 12 (Cannot allocate memory) for event (cycles:ppp). /bin/dmesg may provide additional information. No CONFIG_PERF_EVENTS=y kernel support configured? # After: # echo 64000 > /proc/sys/kernel/perf_event_max_stack # perf record -g usleep 1 Error: Not enough memory to setup event with callchain. Hint: Try tweaking /proc/sys/kernel/perf_event_max_stack Hint: Current value: 64000 # Suggested-by: David Ahern <dsahern@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-ebv0orelj1s1ye857vhb82ov@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-28	bpf tools: Fix syscall argument	Florian Fainelli	1	-1/+1
	Coverity flagged this under CID 1354884 as a sizeof mismatch, it turns out that the argument "attr" passed to syscall should have been a pointer to attr in the first place. Reported-by: coverity (CID 1354884) Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Wang Nan <wangnan0@huawei.com> Cc: Jiri Olsa <jolsa@kernel.org> Fixes: 8f9e05fb298f ("perf tools: Fix PowerPC native building") Link: http://lkml.kernel.org/r/1461551694-5512-3-git-send-email-f.fainelli@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-28	bpf tools: Remove expression with no effect	Florian Fainelli	1	-1/+0
	Assigning "attr" to "attr" does not have any effect, but was caught by Coverity, so let's remove this. Reported-by: coverity (CID 1354720) Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Wang Nan <wangnan0@huawei.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Fixes: 1b76c13e4b36 ("bpf tools: Introduce 'bpf' library and add bpf feature check") Link: http://lkml.kernel.org/r/1461551694-5512-2-git-send-email-f.fainelli@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-28	powercap, perf/x86/intel/rapl: Add PSys support	Srinivas Pandruvada	1	-0/+69
	Skylake processor supports a new set of RAPL registers for controlling entire SoC instead of just CPU package. This is useful for thermal and power control when source of power/thermal is not just CPU/GPU. This change adds a new platform domain (AKA PSys) to the current power capping Intel RAPL driver. PSys also supports PL1 (long term) and PL2 (short term) control like package domain. This also follows same MSRs for energy and time units as package domain. Unlike package domain, PSys support requires more than just processor level implementation. The other parts in the system need additional implementation, which OEMs needs to support. So not all Skylake systems will support PSys. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: bp@alien8.de Cc: hpa@zytor.com Cc: jacob.jun.pan@linux.intel.com Cc: rjw@rjwysocki.net Link: http://lkml.kernel.org/r/1460930581-29748-3-git-send-email-srinivas.pandruvada@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28	Merge branch 'perf/urgent' into perf/core, to resolve conflict	Ingo Molnar	7	-30/+116
	Conflicts: arch/x86/events/intel/pt.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28	perf/x86/intel: Fix incorrect lbr_sel_mask value	Kan Liang	1	-2/+4
	This patch fixes a bug which was introduced by: b16a5b52eb90 ("perf/x86: Add option to disable reading branch flags/cycles") In this patch, lbr_sel_mask is used to mask the lbr_select. But LBR_SEL_MASK doesn't include the bit for LBR_CALL_STACK. So LBR call stack will never be set in lbr_select. This patch corrects the LBR_SEL_MASK by including all valid bits in LBR_SELECT. Also, the LBR_CALL_STACK bit is different as other bit in LBR_SELECT. It does not operate in suppress mode, so it needs to be specially handled in intel_pmu_setup_hw_lbr_filter. Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1461231010-4399-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28	perf/x86/intel/pt: Don't die on VMXON	Alexander Shishkin	4	-11/+75
	Some versions of Intel PT do not support tracing across VMXON, more specifically, VMXON will clear TraceEn control bit and any attempt to set it before VMXOFF will throw a #GP, which in the current state of things will crash the kernel. Namely: $ perf record -e intel_pt// kvm -nographic on such a machine will kill it. To avoid this, notify the intel_pt driver before VMXON and after VMXOFF so that it knows when not to enable itself. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Gleb Natapov <gleb@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: hpa@zytor.com Link: http://lkml.kernel.org/r/87oa9dwrfk.fsf@ashishki-desk.ger.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28	perf/core: Fix perf_event_open() vs. execve() race	Peter Zijlstra	1	-16/+36
	Jann reported that the ptrace_may_access() check in find_lively_task_by_vpid() is racy against exec(). Specifically: perf_event_open() execve() ptrace_may_access() commit_creds() ... if (get_dumpable() != SUID_DUMP_USER) perf_event_exit_task(); perf_install_in_context() would result in installing a counter across the creds boundary. Fix this by wrapping lots of perf_event_open() in cred_guard_mutex. This should be fine as perf_event_exit_task() is already called with cred_guard_mutex held, so all perf locks already nest inside it. Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28	perf/x86/amd: Set the size of event map array to PERF_COUNT_HW_MAX	Adam Borowski	1	-1/+1
	The entry for PERF_COUNT_HW_REF_CPU_CYCLES is not used on AMD, but is referenced by filter_events() which expects undefined events to have a value of 0. Found via KASAN: UBSAN: Undefined behaviour in arch/x86/events/amd/core.c:132:30 index 9 is out of range for type 'u64 [9]' UBSAN: Undefined behaviour in arch/x86/events/amd/core.c:132:9 load of address ffffffff81c021c8 with insufficient space for an object of type 'const u64' Signed-off-by: Adam Borowski <kilobyte@angband.pl> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1461749731-30979-1-git-send-email-kilobyte@angband.pl Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-27	Merge tag 'perf-core-for-mingo-20160427' of ↵	Ingo Molnar	49	-179/+475
	git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: User visible changes: - perf trace --pf maj/min/all works with --call-graph: (Arnaldo Carvalho de Melo) Tracing write syscalls and major page faults with callchains while starting firefox, limiting the stack to 5 frames: # perf trace -e write --pf maj --max-stack 5 firefox 589.549 ( 0.014 ms): firefox/15377 write(fd: 4, buf: 0x7fff80acc898, count: 151) = 151 [0xfaed] (/usr/lib64/libpthread-2.22.so) fire_glxtest_process+0x5c (/usr/lib64/firefox/libxul.so) InstallGdkErrorHandler+0x41 (/usr/lib64/firefox/libxul.so) XREMain::XRE_mainInit+0x12c (/usr/lib64/firefox/libxul.so) XREMain::XRE_main+0x1e4 (/usr/lib64/firefox/libxul.so) 760.704 ( 0.000 ms): firefox/15332 majfault [gtk_tree_view_accessible_get_type+0x0] => /usr/lib64/libgtk-3.so.0.1800.9@0xa0850 (x.) gtk_tree_view_accessible_get_type+0x0 (/usr/lib64/libgtk-3.so.0.1800.9) gtk_tree_view_class_intern_init+0x1a54 (/usr/lib64/libgtk-3.so.0.1800.9) g_type_class_ref+0x6dd (/usr/lib64/libgobject-2.0.so.0.4600.2) [0x115378] (/usr/lib64/libgnutls.so.30.6.3) This automagically selects "--call-graph dwarf", use "--call-graph fp" on systems where -fno-omit-frame-pointer was used to built the components of interest, to incur in less overhead, or tune "--call-graph dwarf" appropriately, see 'perf record --help'. - Allow /proc/sys/kernel/perf_event_max_stack, that defaults to the old hard coded value of PERF_MAX_STACK_DEPTH (127), useful for huge callstacks for things like Groovy, Ruby, etc, and also to reduce overhead by limiting it to a smaller value, upcoming work will allow this to be done per-event (Arnaldo Carvalho de Melo) - Make 'perf trace --min-stack' be honoured by --pf and --event (Arnaldo Carvalho de Melo) - Make 'perf evlist -v' decode perf_event_attr->branch_sample_type (Arnaldo Carvalho de Melo) # perf record --call lbr usleep 1 # perf evlist -v cycles:ppp: ... sample_type: IP\|TID\|TIME\|CALLCHAIN\|PERIOD\|BRANCH_STACK, ... branch_sample_type: USER\|CALL_STACK\|NO_FLAGS\|NO_CYCLES # - Clear dummy entry accumulated period, fixing such 'perf top/report' output as: (Kan Liang) 4769.98% 0.01% 0.00% 0.01% tchain_edit [kernel] [k] update_fast_timekeeper - System calls with pid_t arguments gets them augmented with the COMM event more thoroughly: # trace -e perf_event_open perf stat -e cycles -p 15608 6.876 ( 0.014 ms): perf_event_open(attr_uptr: 0x2ae20d8, pid: 15608 (hexchat), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 3 6.882 ( 0.005 ms): perf_event_open(attr_uptr: 0x2ae20d8, pid: 15639 (gmain), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 4 6.889 ( 0.005 ms): perf_event_open(attr_uptr: 0x2ae20d8, pid: 15640 (gdbus), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 5 ^^^^^^^^^^^^^^^^^^ ^C - Fix offline module name mismatch issue in 'perf probe' (Ravi Bangoria) - Fix module probe issue if no dwarf support in (Ravi Bangoria) Assorted fixes: - Fix off-by-one in write_buildid() (Andrey Ryabinin) - Fix segfault when printing callchains in 'perf script' (Chris Phlipot) - Replace assignment with comparison on assert check in 'perf test' entry (Colin Ian King) - Fix off-by-one comparison in intel-pt code (Colin Ian King) - Close target file on error path in 'perf probe' (Masami Hiramatsu) - Set default kprobe group name if not given in 'perf probe' (Masami Hiramatsu) - Avoid partial perf_event_header reads (Wang Nan) Infrastructure changes: - Update x86's syscall_64.tbl copy, adding preadv2 & pwritev2 (Arnaldo Carvalho de Melo) - Make the x86 clean quiet wrt syscall table removal (Jiri Olsa) Cleanups: - Simplify wrapper for LOCK_PI in 'perf bench futex' (Davidlohr Bueso) - Remove duplicate const qualifier (Eric Engestrom) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-27	perf tools: Set the maximum allowed stack from ↵	Arnaldo Carvalho de Melo	16	-18/+28
	/proc/sys/kernel/perf_event_max_stack There is an upper limit to what tooling considers a valid callchain, and it was tied to the hardcoded value in the kernel, PERF_MAX_STACK_DEPTH (127), now that this can be tuned via a sysctl, make it read it and use that as the upper limit, falling back to PERF_MAX_STACK_DEPTH for kernels where this sysctl isn't present. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-yjqsd30nnkogvj5oyx9ghir9@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-27	perf core: Allow setting up max frame stack depth via sysctl	Arnaldo Carvalho de Melo	13	-23/+84
	The default remains 127, which is good for most cases, and not even hit most of the time, but then for some cases, as reported by Brendan, 1024+ deep frames are appearing on the radar for things like groovy, ruby. And in some workloads putting a _lower_ cap on this may make sense. One that is per event still needs to be put in place tho. The new file is: # cat /proc/sys/kernel/perf_event_max_stack 127 Chaging it: # echo 256 > /proc/sys/kernel/perf_event_max_stack # cat /proc/sys/kernel/perf_event_max_stack 256 But as soon as there is some event using callchains we get: # echo 512 > /proc/sys/kernel/perf_event_max_stack -bash: echo: write error: Device or resource busy # Because we only allocate the callchain percpu data structures when there is a user, which allows for changing the max easily, its just a matter of having no callchain users at that point. Reported-and-Tested-by: Brendan Gregg <brendan.d.gregg@gmail.com> Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: David Ahern <dsahern@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: He Kuang <hekuang@huawei.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Wang Nan <wangnan0@huawei.com> Cc: Zefan Li <lizefan@huawei.com> Link: http://lkml.kernel.org/r/20160426002928.GB16708@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-26	perf bench: Remove one more die() call	Arnaldo Carvalho de Melo	1	-7/+15
	Propagate the error instead. Cc: David Ahern <dsahern@gmail.com> Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-z6erjg35d1gekevwujoa0223@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-26	perf tools: Update x86's syscall_64.tbl, adding preadv2 & pwritev2	Arnaldo Carvalho de Melo	1	-0/+2
	Introduced in commit 4babf2c5efb7 ("x86: wire up preadv2 and pwritev2"). This will make 'perf trace' aware of them. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-vojoylgce2cetsy36446s5ny@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-26	perf probe: Fix module probe issue if no dwarf support	Ravi Bangoria	1	-3/+73
	Perf is not able to register probe in kernel module when dwarf supprt is not there(and so it goes for symtab). Perf passes full path of module where only module name is required which is causing the problem. This patch fixes this issue. Before applying patch: $ dpkg -s libdw-dev dpkg-query: package 'libdw-dev' is not installed and no information is... $ sudo ./perf probe -m /linux/samples/kprobes/kprobe_example.ko kprobe_init Added new event: probe:kprobe_init (on kprobe_init in /linux/samples/kprobes/kprobe_example.ko) You can now use it in all perf tools, such as: perf record -e probe:kprobe_init -aR sleep 1 $ sudo cat /sys/kernel/debug/tracing/kprobe_events p:probe/kprobe_init /linux/samples/kprobes/kprobe_example.ko:kprobe_init $ sudo ./perf record -a -e probe:kprobe_init [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.105 MB perf.data ] $ sudo ./perf script # No output here After applying patch: $ sudo ./perf probe -m /linux/samples/kprobes/kprobe_example.ko kprobe_init Added new event: probe:kprobe_init (on kprobe_init in kprobe_example) You can now use it in all perf tools, such as: perf record -e probe:kprobe_init -aR sleep 1 $ sudo cat /sys/kernel/debug/tracing/kprobe_events p:probe/kprobe_init kprobe_example:kprobe_init $ sudo ./perf record -a -e probe:kprobe_init [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.105 MB perf.data (2 samples) ] $ sudo ./perf script insmod 13990 [002] 5961.216833: probe:kprobe_init: ... insmod 13995 [002] 5962.889384: probe:kprobe_init: ... Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1461680741-12517-1-git-send-email-ravi.bangoria@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-26	perf probe: Fix offline module name missmatch issue	Ravi Bangoria	1	-14/+5
	Perf can add a probe on kernel module which has not been loaded yet. The current implementation finds the module name from path. But if the filename is different from the actual module name then perf fails to register a probe while loading module because of mismatch in the names. For example, samples/kobject/kobject-example.ko is loaded as kobject_example. Before applying patch: $ sudo ./perf probe -m /linux/samples/kobject/kobject-example.ko foo_show Added new event: probe:foo_show (on foo_show in kobject-example) You can now use it in all perf tools, such as: perf record -e probe:foo_show -aR sleep 1 $ cat /sys/kernel/debug/tracing/kprobe_events p:probe/foo_show kobject-example:foo_show $ insmod kobject-example.ko $ lsmod Module Size Used by kobject_example 16384 0 Generate read to /sys/kernel/kobject_example/foo while recording data with below command $ sudo ./perf record -e probe:foo_show -a [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.093 MB perf.data ] $./perf report --stdio -F overhead,comm,dso,sym Error: The perf.data.old file has no samples! After applying patch: $ sudo ./perf probe -m /linux/samples/kobject/kobject-example.ko foo_show Added new event: probe:foo_show (on foo_show in kobject_example) You can now use it in all perf tools, such as: perf record -e probe:foo_show -aR sleep 1 $ sudo cat /sys/kernel/debug/tracing/kprobe_events p:probe/foo_show kobject_example:foo_show $ insmod kobject-example.ko $ lsmod Module Size Used by kobject_example 16384 0 Generate read to /sys/kernel/kobject_example/foo while recording data with below command $ sudo ./perf record -e probe:foo_show -a [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.097 MB perf.data (8 samples) ] $ sudo ./perf report --stdio -F overhead,comm,dso,sym ... # Samples: 8 of event 'probe:foo_show' # Event count (approx.): 8 # # Overhead Command Shared Object Symbol # ........ ....... ................. ............ # 100.00% cat [kobject_example] [k] foo_show Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1461680741-12517-2-git-send-email-ravi.bangoria@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-26	perf trace: Read thread's COMM from /proc when not set	Arnaldo Carvalho de Melo	1	-1/+4
	We get notifications for threads that gets created while we're tracing, but for preexisting threads we may end not having synthesized them, like when tracing a 'perf trace' session that will use '--pid' to trace some other thread. And besides we should probably stop synthesizing those records and instead read thread information in a lazy way, i.e. just when we need, like done in this patch: Now the 'pid_t' argument in 'perf_event_open' gets translated to a COMM: # perf trace -e perf_event_open perf stat -e cycles -p 31601 0.027 ( 0.027 ms): perf/23393 perf_event_open(attr_uptr: 0x2fdd0d8, pid: 31601 (abrt-dump-journ), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ = 3 ^C And in other syscalls containing pid_t without thread->comm_set at the time of the formatting. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-ioeps6dlwst17d6oozc9shtk@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-26	perf thread: Introduce method to set comm from /proc/pid/self	Arnaldo Carvalho de Melo	2	-0/+21
	Will be used for lazy comm loading in 'perf trace'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-7ogbkuoka1y2qsmcckqxvl5m@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-26	tools lib api fs: Add helper to read string from procfs file	Arnaldo Carvalho de Melo	2	-0/+15
	To read things like /proc/self/comm. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-ztpkbmseidt0hq2psr46o0h9@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-26	perf trace: Do not beautify the 'pid' parameter as a simple integer	Arnaldo Carvalho de Melo	1	-2/+1
	Leave it alone so that it ends up assigned to SCA_PID via its type, 'pid_t', that will look up the pid on the machine thread rb_tree and possibly find its COMM. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-r7dujgmhtxxfajuunpt1bkuo@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-26	perf trace: Move perf_flags beautifier to tools/perf/trace/beauty/	Arnaldo Carvalho de Melo	2	-44/+44
	To reduce the size of builtin-trace.c. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-8r3gmymyn3r0ynt4yuzspp9g@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-26	perf probe: Set default kprobe group name if it is not given	Masami Hiramatsu	1	-3/+7
	Set kprobe group name as "probe" if it is not given. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20160426090413.11891.95640.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-26	perf probe: Let probe_file__add_event return 0 if succeeded	Masami Hiramatsu	1	-2/+1
	Since other methods return 0 if succeeded (or filedesc), let probe_file__add_event() return 0 instead of the length of written bytes. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20160426090303.11891.18232.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-26	perf tools: Add lsdir() helper to read a directory	Masami Hiramatsu	2	-0/+37
	As a utility function, add lsdir() which reads given directory and store entry name into a strlist. lsdir accepts a filter function so that user can filter out unneeded entries. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20160426090242.11891.79014.stgit@devbox [ Do not use the 'dirname' it is used in some distros ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-26	perf probe: Close target file on error path	Masami Hiramatsu	1	-2/+7
	Fix a bug to close target elf file in get_text_start_address(). Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20160426064737.1443.44093.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-26	perf evlist: Enforce ring buffer reading	Wang Nan	1	-2/+10
	Don't read broken data after 'head' pointer. Following commits will feed perf_evlist__mmap_read() with some 'head' pointers not maintained by kernel. If 'head' pointer breaks an event, we should avoid reading from the broken event. This can happen in backward ring buffer. For example: old head \| \| V V +---+------+----------+----+-----+--+ \|..E\|D....D\|C........C\|B..B\|A....\|E.\| +---+------+----------+----+-----+--+ 'old' pointer points to the beginning of 'A' and trying read from it, but 'A' has been overwritten. In this case, don't try to read from 'A', simply return NULL. Signed-off-by: Wang Nan <wangnan0@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1461637738-62722-2-git-send-email-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-25	perf hists: Clear dummy entry accumulated period	Kan Liang	1	-0/+2
	The accumulated period for dummy entry should also be 0. Otherwise, the total overhead could be overcounted. $ perf record -e '{LLC-load-misses,cpu/instructions/}' --call-graph=lbr ./tchain $ perf report --stdio # To display the perf.data header info, please use --header/--header-only options. # # Total Lost Samples: 0 # # Samples: 21K of event 'anon group { LLC-load-misses, cpu/instructions/ }' # Event count (approx.): 16313667937 # # Children Self Command Shared Object Symbol # ................ ................ ........... ................ ............................ # 4769.98% 0.01% 0.00% 0.01% tchain_edit [kernel.vmlinux] [k] update_fast_timekeeper 4356.18% 0.01% 0.00% 0.01% tchain_edit [kernel.vmlinux] [k] trigger_load_balance 3181.12% 0.01% 0.00% 0.01% tchain_edit [kernel.vmlinux] [k] irq_work_tick 1592.37% 0.00% 0.00% 0.00% tchain_edit [kernel.vmlinux] [k] cpu_needs_another_gp Signed-off-by: Kan Liang <kan.liang@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1461565689-5862-1-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-25	perf intel-pt: Fix off-by-one comparison on maximum code	Colin Ian King	1	-1/+1
	The check for the maximum code is off-by-one; the current comparison of a code that is INTEL_PT_ERR_MAX will cause the strlcpy to perform an out of bounds array access on the intel_pt_err_msgs array. Fix this with a >= comparison. Signed-off-by: Colin Ian King <colin.king@canonical.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1461524203-10224-1-git-send-email-colin.king@canonical.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-25	perf bench futex: Simplify wrapper for LOCK_PI	Davidlohr Bueso	2	-5/+3
	Given that the 'val' parameter is ignored for FUTEX_LOCK_PI, get rid of the bogus deadlock detection flag in the wrapper code and avoid the extra argument, making it resemble its unlock counterpart. And if nothing else, we already only pass 0 anyway. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Davidlohr Bueso <dbueso@suse.de> Link: http://lkml.kernel.org/r/1461208447-29328-1-git-send-email-dave@stgolabs.net Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-25	perf tests: Replace assignment with comparison on assert check	Colin Ian King	1	-1/+1
	The current assert check is checking an assignment, which will always be true. Instead, the assert should be checking if scale is equal to 0.122 Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1461419154-16918-1-git-send-email-colin.king@canonical.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-25	perf tools: Remove duplicate const qualifier	Eric Engestrom	1	-1/+1
	Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1461577678-29517-1-git-send-email-eric.engestrom@imgtec.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-25	tools build: Fix perf_clean target	Jiri Olsa	1	-1/+2
	Fix perf_clean target to follow the same logic as perf target. Fixes the following make invokation: $ cd <kernelsrc> && make tools/perf_clean Reported-by: TJ <linux@iam.tj> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=116411 Link: http://lkml.kernel.org/r/1461615438-27894-2-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-25	perf tools: Make the x86 clean quiet	Jiri Olsa	1	-1/+1
	Turn current clean output: $ make clean rm -f arch/x86/include/generated/asm/syscalls_64.c CLEAN libbpf CLEAN libapi into: $ make clean CLEAN x86 CLEAN libapi CLEAN libbpf Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: TJ <linux@iam.tj> Link: http://lkml.kernel.org/r/1461615438-27894-1-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-25	perf evlist: Decode perf_event_attr->branch_sample_type	Arnaldo Carvalho de Melo	1	-1/+17
	While trying to use --call-graph lbr in 'perf trace', since we only are interested in the callchain for userspace, up to the callchain, I found that 'perf evlist' is not decoding the branch_sample_type field, fix it. Before: # perf record --call-graph lbr usleep 1 # perf evlist -v cycles:ppp: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP\|TID\|TIME\|CALLCHAIN\|CPU\|PERIOD\|BRANCH_STACK, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, branch_sample_type: 51201 ^^^^^^^^^^^^^^^^^^^^^^^^^ After: # perf evlist -v cycles:ppp: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP\|TID\|TIME\|CALLCHAIN\|CPU\|PERIOD\|BRANCH_STACK, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, branch_sample_type: USER\|CALL_STACK\|NO_FLAGS\|NO_CYCLES ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-hozai7974u0ulgx13k96fcaw@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-25	perf trace: Make --pf honour --min-stack too	Arnaldo Carvalho de Melo	1	-4/+15
	To check deeply nested page fault callchains. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-wuji34xx003kr88nmqt6jkgf@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-25	perf trace: Make --event honour --min-stack too	Arnaldo Carvalho de Melo	1	-5/+16
	Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-shj0fazntmskhjild5i6x73l@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-25	perf script: Fix segfault when printing callchains	Chris Phlipot	1	-6/+6
	This fixes a bug caused by an unitialized callchain cursor. The crash frist appeared in: 6f736735e30f ("perf evsel: Require that callchains be resolved before calling fprintf_{sym,callchain}") The callchain cursor is a struct that contains pointers, that when uninitialized will cause unpredictable behavior (usually a crash) when trying to append to the callchain. The existing implementation has the following issues: 1. The callchain cursor used is not initialized, resulting in unpredictable behavior when used. 2. The cursor is declared on the stack. Even if it is properly initalized, the implmentation will leak memory when the function returns, since all the references to the callchain_nodes allocated by callchain_cursor_append will be lost when the cursor goes out of scope. 3. Storing the cursor on the stack is inefficient. Even if memory is properly freed when it goes out of scope, a performance penalty will be incurred due to reallocation of callchain nodes. callchain_cursor_append is designed to avoid these reallocations when an existing cursor is reused. This patch fixes the crash by replacing cursor_callchain with a reference to the global callchain_cursor which also resolves all 3 issues mentioned above. How to reproduce the crash: $ perf record --call-graph=dwarf stress -t 1 -c 1 $ perf script > /dev/null Segfault Signed-off-by: Chris Phlipot <cphlipot0@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: 6f736735e30f ("perf evsel: Require that callchains be resolved before calling fprintf_{sym,callchain}") Link: http://lkml.kernel.org/r/1461119531-2529-1-git-send-email-cphlipot0@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-25	perf trace: Make --pf maj/min/all use callchains too	Arnaldo Carvalho de Melo	1	-18/+41
	Forgot about page faults, a software event, when adding support for callchains, fix it: # trace --no-syscalls --pf maj --call dwarf 0.000 ( 0.000 ms): Xorg/2068 majfault [sfbSegment1+0x0] => /usr/lib64/xorg/modules/drivers/intel_drv.so@0x11b490 (x.) sfbSegment1+0x0 (/usr/lib64/xorg/modules/drivers/intel_drv.so) fbPolySegment32+0x361 (/usr/lib64/xorg/modules/drivers/intel_drv.so) sna_poly_segment+0x743 (/usr/lib64/xorg/modules/drivers/intel_drv.so) damagePolySegment+0x77 (/usr/libexec/Xorg) ProcPolySegment+0xe7 (/usr/libexec/Xorg) Dispatch+0x25f (/usr/libexec/Xorg) dix_main+0x3c3 (/usr/libexec/Xorg) __libc_start_main+0xf0 (/usr/lib64/libc-2.22.so) _start+0x29 (/usr/libexec/Xorg) 0.257 ( 0.000 ms): Xorg/2068 majfault [miZeroClipLine+0x0] => /usr/libexec/Xorg@0x18e830 (x.) miZeroClipLine+0x0 (/usr/libexec/Xorg) _fbSegment+0x2c0 (/usr/lib64/xorg/modules/drivers/intel_drv.so) sfbSegment1+0x67 (/usr/lib64/xorg/modules/drivers/intel_drv.so) fbPolySegment32+0x361 (/usr/lib64/xorg/modules/drivers/intel_drv.so) sna_poly_segment+0x743 (/usr/lib64/xorg/modules/drivers/intel_drv.so) damagePolySegment+0x77 (/usr/libexec/Xorg) ProcPolySegment+0xe7 (/usr/libexec/Xorg) Dispatch+0x25f (/usr/libexec/Xorg) dix_main+0x3c3 (/usr/libexec/Xorg) __libc_start_main+0xf0 (/usr/lib64/libc-2.22.so) _start+0x29 (/usr/libexec/Xorg) ^C# Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-8h6ssirw5z15qyhy2lwd6f89@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-25	perf trace: Extract evsel contructor from perf_evlist__add_pgfault	Arnaldo Carvalho de Melo	1	-15/+16
	Prep work for next patches, where we'll need access to the created evsels, to possibly configure callchains. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-2pcgsgnkgellhlcao4aub8tu@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-25	perf buildid: Fix off-by-one in write_buildid()	Andrey Ryabinin	1	-3/+3
	write_buildid() increments 'name_len' with intention to take into account trailing zero byte. However, 'name_len' was already incremented in machine__write_buildid_table() before. So this leads to out-of-bounds read in do_write(): $ ./perf record sleep 0 [ perf record: Woken up 1 times to write data ] ================================================================= ==15899==ERROR: AddressSanitizer: global-buffer-overflow on address 0x00000099fc92 at pc 0x7f1aa9c7eab5 bp 0x7fff940f84d0 sp 0x7fff940f7c78 READ of size 19 at 0x00000099fc92 thread T0 #0 0x7f1aa9c7eab4 (/usr/lib/gcc/x86_64-pc-linux-gnu/5.3.0/libasan.so.2+0x44ab4) #1 0x649c5b in do_write util/header.c:67 #2 0x649c5b in write_padded util/header.c:82 #3 0x57e8bc in write_buildid util/build-id.c:239 #4 0x57e8bc in machine__write_buildid_table util/build-id.c:278 ... 0x00000099fc92 is located 0 bytes to the right of global variable '.LC99' defined in 'util/symbol.c' (0x99fc80) of size 18 '.LC99' is ascii string '[kernel.kallsyms]' ... Shadow bytes around the buggy address: 0x00008012bf80: f9 f9 f9 f9 00 00 00 00 00 00 03 f9 f9 f9 f9 f9 =>0x00008012bf90: 00 00[02]f9 f9 f9 f9 f9 00 00 00 00 00 05 f9 f9 0x00008012bfa0: f9 f9 f9 f9 00 03 f9 f9 f9 f9 f9 f9 00 00 00 00 Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1461053847-5633-1-git-send-email-aryabinin@virtuozzo.com [ Remove the off-by one at the origin, to keep len(s) == strlen(s) assumption ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-23	Merge tag 'perf-core-for-mingo-20160419' of ↵	Ingo Molnar	18	-56/+74
	git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: Build fixes: - Fix 'perf trace' build when DWARF unwind isn't available (Arnaldo Carvalho de Melo) - Remove x86 references from arch-neutral Build, fixing it in !x86 arches, reported as breaking the build for powerpc64le in linux-next (Arnaldo Carvalho de Melo) Infrastructure changes: - Do memset() variable 'st' using the correct size in the jit code (Colin Ian King) - Fix postgresql ubuntu 'perf script' install instructions (Chris Phlipot) - Use callchain_param more thoroughly when checking how callchains were configured, eventually will be the only way to look for callchain parameters (Arnaldo Carvalho de Melo) - Fix some issues in the 'perf test kallsyms' entry (Arnaldo Carvalho de Melo) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23	x86/perf/rapl: Add missing Broadwell model	Peter Zijlstra	1	-0/+1
	With the array aligned as per events/intel/core.c it was fairly obvious we missed one, add it in. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23	x86/perf/rapl: Reorder model numbers	Peter Zijlstra	1	-4/+9
	Re-order the model array to match the order in events/intel/core.c, to easier spot gaps and such. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23	perf/x86/intel/rapl: Support Skylake RAPL domains	Srinivas Pandruvada	2	-2/+54
	Add Skylake client support for RAPL domains. In addition to RAPL domains in Broadwell clients, it has support for platform domain (aka PSys). The PSys domain controls the entire SoC instead of just a CPU package. Unlike package domain, PSys support requires more than just processor level implementation. The other parts in the system need additional HW level signaling, which OEMs need to support. When not supported, the energy counter register in PSys domain returns 0. Also corrected error in comment for GPU counter, which previously was DRAM counter. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com [ Cnverted to model_match stuff. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: bp@alien8.de Cc: hpa@zytor.com Cc: jacob.jun.pan@linux.intel.com Cc: rjw@rjwysocki.net Link: http://lkml.kernel.org/r/1460930581-29748-2-git-send-email-srinivas.pandruvada@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23	perf/core: Add ::write_backward attribute to perf event	Wang Nan	4	-10/+85
	This patch introduces 'write_backward' bit to perf_event_attr, which controls the direction of a ring buffer. After set, the corresponding ring buffer is written from end to beginning. This feature is design to support reading from overwritable ring buffer. Ring buffer can be created by mapping a perf event fd. Kernel puts event records into ring buffer, user tooling like perf fetch them from address returned by mmap(). To prevent racing between kernel and tooling, they communicate to each other through 'head' and 'tail' pointers. Kernel maintains 'head' pointer, points it to the next free area (tail of the last record). Tooling maintains 'tail' pointer, points it to the tail of last consumed record (record has already been fetched). Kernel determines the available space in a ring buffer using these two pointers to avoid overwrite unfetched records. By mapping without 'PROT_WRITE', an overwritable ring buffer is created. Different from normal ring buffer, tooling is unable to maintain 'tail' pointer because writing is forbidden. Therefore, for this type of ring buffers, kernel overwrite old records unconditionally, works like flight recorder. This feature would be useful if reading from overwritable ring buffer were as easy as reading from normal ring buffer. However, there's an obscure problem. The following figure demonstrates a full overwritable ring buffer. In this figure, the 'head' pointer points to the end of last record, and a long record 'E' is pending. For a normal ring buffer, a 'tail' pointer would have pointed to position (X), so kernel knows there's no more space in the ring buffer. However, for an overwritable ring buffer, kernel ignore the 'tail' pointer. (X) head . \| . V +------+-------+----------+------+---+ \|A....A\|B.....B\|C........C\|D....D\| \| +------+-------+----------+------+---+ Record 'A' is overwritten by event 'E': head \| V +--+---+-------+----------+------+---+ \|.E\|..A\|B.....B\|C........C\|D....D\|E..\| +--+---+-------+----------+------+---+ Now tooling decides to read from this ring buffer. However, none of these two natural positions, 'head' and the start of this ring buffer, are pointing to the head of a record. Even the full ring buffer can be accessed by tooling, it is unable to find a position to start decoding. The first attempt tries to solve this problem AFAIK can be found from [1]. It makes kernel to maintain 'tail' pointer: updates it when ring buffer is half full. However, this approach introduces overhead to fast path. Test result shows a 1% overhead [2]. In addition, this method utilizes no more tham 50% records. Another attempt can be found from [3], which allows putting the size of an event at the end of each record. This approach allows tooling to find records in a backward manner from 'head' pointer by reading size of a record from its tail. However, because of alignment requirement, it needs 8 bytes to record the size of a record, which is a huge waste. Its performance is also not good, because more data need to be written. This approach also introduces some extra branch instructions to fast path. 'write_backward' is a better solution to this problem. Following figure demonstrates the state of the overwritable ring buffer when 'write_backward' is set before overwriting: head \| V +---+------+----------+-------+------+ \| \|D....D\|C........C\|B.....B\|A....A\| +---+------+----------+-------+------+ and after overwriting: head \| V +---+------+----------+-------+---+--+ \|..E\|D....D\|C........C\|B.....B\|A..\|E.\| +---+------+----------+-------+---+--+ In each situation, 'head' points to the beginning of the newest record. From this record, tooling can iterate over the full ring buffer and fetch records one by one. The only limitation that needs to be considered is back-to-back reading. Due to the non-deterministic of user programs, it is impossible to ensure the ring buffer keeps stable during reading. Consider an extreme situation: tooling is scheduled out after reading record 'D', then a burst of events come, eat up the whole ring buffer (one or multiple rounds). When the tooling process comes back, reading after 'D' is incorrect now. To prevent this problem, we need to find a way to ensure the ring buffer is stable during reading. ioctl(PERF_EVENT_IOC_PAUSE_OUTPUT) is suggested because its overhead is lower than ioctl(PERF_EVENT_IOC_ENABLE). By carefully verifying 'header' pointer, reader can avoid pausing the ring-buffer. For example: /* A union of all possible events / union perf_event event; p = head = perf_mmap__read_head(); while (true) { / copy header of next event / fetch(&event.header, p, sizeof(event.header)); / read 'head' pointer / head = perf_mmap__read_head(); / check overwritten: is the header good? / if (!verify(sizeof(event.header), p, head)) break; / copy the whole event / fetch(&event, p, event.header.size); / read 'head' pointer again / head = perf_mmap__read_head(); / is the whole event good? / if (!verify(event.header.size, p, head)) break; p += event.header.size; } However, the overhead is high because: a) In-place decoding is not safe. Copying-verifying-decoding is required. b) Fetching 'head' pointer requires additional synchronization. (From Alexei Starovoitov: Even when this trick works, pause is needed for more than stability of reading. When we collect the events into overwrite buffer we're waiting for some other trigger (like all cpu utilization spike or just one cpu running and all others are idle) and when it happens the buffer has valuable info from the past. At this point new events are no longer interesting and buffer should be paused, events read and unpaused until next trigger comes.) This patch utilizes event's default overflow_handler introduced previously. perf_event_output_backward() is created as the default overflow handler for backward ring buffers. To avoid extra overhead to fast path, original perf_event_output() becomes __perf_event_output() and marked '__always_inline'. In theory, there's no extra overhead introduced to fast path. Performance testing: Calling 3000000 times of 'close(-1)', use gettimeofday() to check duration. Use 'perf record -o /dev/null -e raw_syscalls:' to capture system calls. In ns. Testing environment: CPU : Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz Kernel : v4.5.0 MEAN STDVAR BASE 800214.950 2853.083 PRE1 2253846.700 9997.014 PRE2 2257495.540 8516.293 POST 2250896.100 8933.921 Where 'BASE' is pure performance without capturing. 'PRE1' is test result of pure 'v4.5.0' kernel. 'PRE2' is test result before this patch. 'POST' is test result after this patch. See [4] for the detailed experimental setup. Considering the stdvar, this patch doesn't introduce performance overhead to the fast path. [1] http://lkml.iu.edu/hypermail/linux/kernel/1304.1/04584.html [2] http://lkml.iu.edu/hypermail/linux/kernel/1307.1/00535.html [3] http://lkml.iu.edu/hypermail/linux/kernel/1512.0/01265.html [4] http://lkml.kernel.org/g/56F89DCD.1040202@huawei.com Signed-off-by: Wang Nan <wangnan0@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: <acme@kernel.org> Cc: <pi3orama@163.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: He Kuang <hekuang@huawei.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Zefan Li <lizefan@huawei.com> Link: http://lkml.kernel.org/r/1459865478-53413-1-git-send-email-wangnan0@huawei.com [ Fixed the changelog some more. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23	perf/x86/intel: Add LBR filter support for Silvermont and Airmont CPUs	Kan Liang	3	-1/+21
	LBR filtering is also supported on the Silvermont and Airmont microarchitectures. The layout of MSR_LBR_SELECT is the same as Nehalem. Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1460706825-46163-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23	perf/x86/intel: Add Goldmont CPU support	Kan Liang	4	-1/+177
	Add perf core PMU support for Intel Goldmont CPU cores: - The init code is based on Silvermont. - There is a new cache event list, based on the Silvermont cache event list. - Goldmont has 32 LBR entries. It also uses new LBRv6 format, which report the cycle information using upper 16-bit of the LBR_TO. - It's recommended to use CPU_CLK_UNHALTED.CORE_P + NPEBS for precise cycles. For details, please refer to the latest SDM058: http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.pdf Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1460706167-45320-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23	Merge branch 'perf/urgent' into perf/core, to resolve conflict	Ingo Molnar	176	-940/+1301
	Signed-off-by: Ingo Molnar <mingo@kernel.org>