aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf
AgeCommit message (Collapse)Author
2016-08-18perf evsel: Do not access outside hw cache name arraysArnaldo Carvalho de Melo
We have to check if the values are >= *_MAX, not just >, fix it. From the bugzilla report: ''In file /tools/perf/util/evsel.c function __perf_evsel__hw_cache_name it appears that there is a bug that reads beyond the end of the buffer. The statement "if (type > PERF_COUNT_HW_CACHE_MAX)" allows type to be equal to the maximum value. Later, when statement "if (!perf_evsel__is_cache_op_valid(type, op))" is executed, the function can access array perf_evsel__hw_cache_stat[type] beyond the end of the buffer. It appears to me that the statement "if (type > PERF_COUNT_HW_CACHE_MAX)" should be "if (type >= PERF_COUNT_HW_CACHE_MAX)" Bug found with Coverity and manual code review. No attempts were made to execute the code with a maximum type value.'' Committer note: Testing it: $ perf record -e $(echo $(perf list cache | cut -d \[ -f1) | sed 's/ /,/g') usleep 1 [ perf record: Woken up 16 times to write data ] [ perf record: Captured and wrote 0.023 MB perf.data (34 samples) ] $ perf evlist L1-dcache-load-misses L1-dcache-loads L1-dcache-stores L1-icache-load-misses LLC-load-misses LLC-loads LLC-store-misses LLC-stores branch-load-misses branch-loads dTLB-load-misses dTLB-loads dTLB-store-misses dTLB-stores iTLB-load-misses iTLB-loads node-load-misses node-loads node-store-misses node-stores $ perf list cache List of pre-defined events (to be used in -e): L1-dcache-load-misses [Hardware cache event] L1-dcache-loads [Hardware cache event] L1-dcache-stores [Hardware cache event] L1-icache-load-misses [Hardware cache event] LLC-load-misses [Hardware cache event] LLC-loads [Hardware cache event] LLC-store-misses [Hardware cache event] LLC-stores [Hardware cache event] branch-load-misses [Hardware cache event] branch-loads [Hardware cache event] dTLB-load-misses [Hardware cache event] dTLB-loads [Hardware cache event] dTLB-store-misses [Hardware cache event] dTLB-stores [Hardware cache event] iTLB-load-misses [Hardware cache event] iTLB-loads [Hardware cache event] node-load-misses [Hardware cache event] node-loads [Hardware cache event] node-store-misses [Hardware cache event] node-stores [Hardware cache event] $ Reported-by: Brian Sweeney <bsweeney@lgsinnovations.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=153351 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-16perf unwind: Use addr_location::addr instead of ip for entriesMilian Wolff
This fixes the srcline translation for call chains of user space applications. Before we got: perf report --stdio --no-children -s sym,srcline -g address 8.92% [.] main mandelbrot.h:41 | |--3.70%--main +8390240 | __libc_start_main +139950056726769 | _start +8388650 | |--2.74%--main +8390189 | --2.08%--main +8390296 __libc_start_main +139950056726769 _start +8388650 7.59% [.] main complex:1326 | |--4.79%--main +8390203 | __libc_start_main +139950056726769 | _start +8388650 | --2.80%--main +8390219 7.12% [.] __muldc3 libgcc2.c:1945 | |--3.76%--__muldc3 +139950060519490 | main +8390224 | __libc_start_main +139950056726769 | _start +8388650 | --3.32%--__muldc3 +139950060519512 main +8390224 With this patch applied, we instead get: perf report --stdio --no-children -s sym,srcline -g address 8.92% [.] main mandelbrot.h:41 | |--3.70%--main mandelbrot.h:41 | __libc_start_main +241 | _start +4194346 | |--2.74%--main mandelbrot.h:41 | --2.08%--main mandelbrot.h:41 __libc_start_main +241 _start +4194346 7.59% [.] main complex:1326 | |--4.79%--main complex:1326 | __libc_start_main +241 | _start +4194346 | --2.80%--main complex:1326 7.12% [.] __muldc3 libgcc2.c:1945 | |--3.76%--__muldc3 libgcc2.c:1945 | main mandelbrot.h:39 | __libc_start_main +241 | _start +4194346 | --3.32%--__muldc3 libgcc2.c:1945 main mandelbrot.h:39 Suggested-and-Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> LPU-Reference: 20160816153926.11288-1-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-15perf intel-pt: Fix occasional decoding errors when tracing system-wideAdrian Hunter
In order to successfully decode Intel PT traces, context switch events are needed from the moment the trace starts. Currently that is ensured by using the 'immediate' flag which enables the switch event when it is opened. However, since commit 86c2786994bd ("perf intel-pt: Add support for PERF_RECORD_SWITCH") that might not always happen. When tracing system-wide the context switch event is added to the tracking event which was not set as 'immediate'. Change that so it is. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org # v4.4+ Fixes: 86c2786994bd ("perf intel-pt: Add support for PERF_RECORD_SWITCH") Link: http://lkml.kernel.org/r/1471245784-22580-1-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-15perf probe: Release resources on error when handling exit pathsArnaldo Carvalho de Melo
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Colin King <colin.king@canonical.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-zh2j4iqimralugke5qq7dn6d@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-15perf probe: Check for dup and fdopen failuresColin Ian King
dup and fdopen can potentially fail, so add some extra error handling checks rather than assuming they always work. Signed-off-by: Colin King <colin.king@canonical.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1471038296-12956-1-git-send-email-colin.king@canonical.com [ Free resources when those functions (now being verified) fail ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-15perf symbols: Fix annotation of objects with debuginfo filesAnton Blanchard
Commit 73cdf0c6ea9c ("perf symbols: Record text offset in dso to calculate objdump address") started storing the offset of the text section for all DSOs: if (elf_section_by_name(elf, &ehdr, &tshdr, ".text", NULL)) dso->text_offset = tshdr.sh_addr - tshdr.sh_offset; Unfortunately this breaks debuginfo files, because we need to calculate the offset of the text section in the associated executable file. As a result perf annotate returns junk for all debuginfo files. Fix this by using runtime_ss->elf which should point at the executable when parsing a debuginfo file. Signed-off-by: Anton Blanchard <anton@samba.org> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Tested-by: Wang Nan <wangnan0@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: stable@vger.kernel.org # v4.6+ Fixes: 73cdf0c6ea9c ("perf symbols: Record text offset in dso to calculate objdump address") Link: http://lkml.kernel.org/r/20160813115533.6de17912@kryten Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-15perf script: Don't disable use_callchain if input is pipeHe Kuang
Because perf data from pipe do not have a header with evsel attr, we should not check that and disable symbol_conf.use_callchain. Otherwise, perf script won't show callchains even if the data stream contains callchain. Before: $ perf record -g -o - uname |perf script Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.000 MB - ] uname 1828 182630.186578: 250000 cpu-clock: ..b9499 setup_arg_pages uname 1828 182630.186850: 250000 cpu-clock: ..83b20 ___might_sleep uname 1828 182630.187153: 250000 cpu-clock: ..4b6be file_map_prot_ch ... After: $ perf record -g -o - uname |perf script Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.000 MB - ] uname 1833 182675.927099: 250000 cpu-clock: ba5520 _raw_spin_lock+0xfe200040 ([kernel.kallsyms]) 389dd4 expand_downwards+0xfe200154 ([kernel.kallsyms]) 389f34 expand_stack+0xfe200024 ([kernel.kallsyms]) 3b957e setup_arg_pages+0xfe20019e ([kernel.kallsyms]) 40c80f load_elf_binary+0xfe20042f ([kernel.kallsyms]) ... Signed-off-by: He Kuang <hekuang@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1470309943-153909-2-git-send-email-hekuang@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-15perf script: Show proper message when failed list scriptsHe Kuang
Perf shows the usage message when perf scripts folder failed to open, which misleads users to let them think the command is being mistyped. This patch shows a proper message and guides users to check the PERF_EXEC_PATH environment variable in that case. Before: $ perf script --list Usage: perf script [<options>] or: perf script [<options>] record <script> [<record-options>] <command> or: perf script [<options>] report <script> [script-args] or: perf script [<options>] <script> [<record-options>] <command> or: perf script [<options>] <top-script> [script-args] -l, --list list available scripts After: $ perf script --list open(/home/user/perf-core/scripts) failed. Check for "PERF_EXEC_PATH" env to set scripts dir. Signed-off-by: He Kuang <hekuang@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1470309943-153909-1-git-send-email-hekuang@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-15perf jitdump: Add the right header to get the major()/minor() definitionsArnaldo Carvalho de Melo
Noticed on Fedora Rawhide: $ gcc --version gcc (GCC) 6.1.1 20160721 (Red Hat 6.1.1-4) $ rpm -q glibc glibc-2.24.90-1.fc26.x86_64 $ CC /tmp/build/perf/util/jitdump.o util/jitdump.c: In function 'jit_repipe_code_load': util/jitdump.c:428:2: error: '__major_from_sys_types' is deprecated: In the GNU C Library, `major' is defined by <sys/sysmacros.h>. For historical compatibility, it is currently defined by <sys/types.h> as well, but we plan to remove this soon. To use `major', include <sys/sysmacros.h> directly. If you did not intend to use a system-defined macro `major', you should #undef it after including <sys/types.h>. [-Werror=deprecated-declarations] event->mmap2.maj = major(st.st_dev); ^~~~~ In file included from /usr/include/features.h:397:0, from /usr/include/sys/types.h:25, from util/jitdump.c:1: /usr/include/sys/sysmacros.h:87:1: note: declared here __SYSMACROS_DEFINE_MAJOR (__SYSMACROS_FST_IMPL_TEMPL) Fix it following that recomendation. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-3majvd0adhfr25rvx4v5e9te@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-12perf ppc64le: Fix build failure when libelf is not presentRavi Bangoria
arch__post_process_probe_trace_events() calls get_target_map() to prepare symbol table. get_target_map() is defined inside util/probe-event.c. probe-event.c will only get included in perf binary if CONFIG_LIBELF is set. Hence arch__post_process_probe_trace_events() needs to be defined inside #ifdef HAVE_LIBELF_SUPPORT to solve compilation error. Reported-and-Tested-by: Anton Blanchard <anton@samba.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/57ABFF88.8030905@linux.vnet.ibm.com [ Thunderbird MUA mangled it, fix that ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-12perf tools mem: Fix -t store option for record commandJiri Olsa
Michael reported 'perf mem -t store record' being broken. The reason is latest rework of this area: commit acbe613e0c03 ("perf tools: Add monitored events array") We don't mark perf_mem_events store record when -t store option is specified. Committer notes: Before: # perf mem -t store record usleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.020 MB perf.data (7 samples) ] # perf evlist cycles:ppp # After: # perf mem -t store record usleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.020 MB perf.data (7 samples) ] # perf evlist cpu/mem-stores/P # Reported-by: Michael Petlan <mpetlan@redhat.com> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Fixes: acbe613e0c03 ("perf tools: Add monitored events array") Link: http://lkml.kernel.org/r/1470905457-18311-1-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-12perf intel-pt: Fix ip compressionAdrian Hunter
The June 2015 Intel SDM introduced IP Compression types 4 and 6. Refer to section 36.4.2.2 Target IP (TIP) Packet - IP Compression. Existing Intel PT packet decoder did not support type 4, and got type 6 wrong. Because type 3 and type 4 have the same number of bytes, the packet 'count' has been changed from being the number of ip bytes to being the type code. That allows the Intel PT decoder to correctly decide whether to sign-extend or use the last ip. However that also meant the code had to be adjusted in a number of places. Currently hardware is not using the new compression types, so this fix has no effect on existing hardware. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1469005206-3049-1-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-09perf probe ppc64le: Fix probe location when using DWARFRavi Bangoria
Powerpc has Global Entry Point and Local Entry Point for functions. LEP catches call from both the GEP and the LEP. Symbol table of ELF contains GEP and Offset from which we can calculate LEP, but debuginfo does not have LEP info. Currently, perf prioritize symbol table over dwarf to probe on LEP for ppc64le. But when user tries to probe with function parameter, we fall back to using dwarf(i.e. GEP) and when function called via LEP, probe will never hit. For example: $ objdump -d vmlinux ... do_sys_open(): c0000000002eb4a0: e8 00 4c 3c addis r2,r12,232 c0000000002eb4a4: 60 00 42 38 addi r2,r2,96 c0000000002eb4a8: a6 02 08 7c mflr r0 c0000000002eb4ac: d0 ff 41 fb std r26,-48(r1) $ sudo ./perf probe do_sys_open $ sudo cat /sys/kernel/debug/tracing/kprobe_events p:probe/do_sys_open _text+3060904 $ sudo ./perf probe 'do_sys_open filename:string' $ sudo cat /sys/kernel/debug/tracing/kprobe_events p:probe/do_sys_open _text+3060896 filename_string=+0(%gpr4):string For second case, perf probed on GEP. So when function will be called via LEP, probe won't hit. $ sudo ./perf record -a -e probe:do_sys_open ls [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.195 MB perf.data ] To resolve this issue, let's not prioritize symbol table, let perf decide what it wants to use. Perf is already converting GEP to LEP when it uses symbol table. When perf uses debuginfo, let it find LEP offset form symbol table. This way we fall back to probe on LEP for all cases. After patch: $ sudo ./perf probe 'do_sys_open filename:string' $ sudo cat /sys/kernel/debug/tracing/kprobe_events p:probe/do_sys_open _text+3060904 filename_string=+0(%gpr4):string $ sudo ./perf record -a -e probe:do_sys_open ls [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.197 MB perf.data (11 samples) ] Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1470723805-5081-2-git-send-email-ravi.bangoria@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-09perf probe: Add function to post process kernel trace eventsRavi Bangoria
Instead of inline code, introduce function to post process kernel probe trace events. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1470723805-5081-1-git-send-email-ravi.bangoria@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-09perf probe: Support signedness castingNaohiro Aota
The 'perf probe' tool detects a variable's type and use the detected type to add a new probe. Then, kprobes prints its variable in hexadecimal format if the variable is unsigned and prints in decimal if it is signed. We sometimes want to see unsigned variable in decimal format (i.e. sector_t or size_t). In that case, we need to investigate the variable's size manually to specify just signedness. This patch add signedness casting support. By specifying "s" or "u" as a type, perf-probe will investigate variable size as usual and use the specified signedness. E.g. without this: $ perf probe -a 'submit_bio bio->bi_iter.bi_sector' Added new event: probe:submit_bio (on submit_bio with bi_sector=bio->bi_iter.bi_sector) You can now use it in all perf tools, such as: perf record -e probe:submit_bio -aR sleep 1 $ cat trace_pipe|head dbench-9692 [003] d..1 971.096633: submit_bio: (submit_bio+0x0/0x140) bi_sector=0x3a3d00 dbench-9692 [003] d..1 971.096685: submit_bio: (submit_bio+0x0/0x140) bi_sector=0x1a3d80 dbench-9692 [003] d..1 971.096687: submit_bio: (submit_bio+0x0/0x140) bi_sector=0x3a3d80 ... // need to investigate the variable size $ perf probe -a 'submit_bio bio->bi_iter.bi_sector:s64' Added new event: probe:submit_bio (on submit_bio with bi_sector=bio->bi_iter.bi_sector:s64) You can now use it in all perf tools, such as: perf record -e probe:submit_bio -aR sleep 1 With this: // just use "s" to cast its signedness $ perf probe -v -a 'submit_bio bio->bi_iter.bi_sector:s' Added new event: probe:submit_bio (on submit_bio with bi_sector=bio->bi_iter.bi_sector:s) You can now use it in all perf tools, such as: perf record -e probe:submit_bio -aR sleep 1 $ cat trace_pipe|head dbench-9689 [001] d..1 1212.391237: submit_bio: (submit_bio+0x0/0x140) bi_sector=128 dbench-9689 [001] d..1 1212.391252: submit_bio: (submit_bio+0x0/0x140) bi_sector=131072 dbench-9697 [006] d..1 1212.398611: submit_bio: (submit_bio+0x0/0x140) bi_sector=30208 This commit also update perf-probe.txt to describe "types". Most parts are based on existing documentation: Documentation/trace/kprobetrace.txt Committer note: Testing using 'perf trace': # perf probe -a 'submit_bio bio->bi_iter.bi_sector' Added new event: probe:submit_bio (on submit_bio with bi_sector=bio->bi_iter.bi_sector) You can now use it in all perf tools, such as: perf record -e probe:submit_bio -aR sleep 1 # trace --no-syscalls --ev probe:submit_bio 0.000 probe:submit_bio:(ffffffffac3aee00) bi_sector=0xc133c0) 3181.861 probe:submit_bio:(ffffffffac3aee00) bi_sector=0x6cffb8) 3181.881 probe:submit_bio:(ffffffffac3aee00) bi_sector=0x6cffc0) 3184.488 probe:submit_bio:(ffffffffac3aee00) bi_sector=0x6cffc8) <SNIP> 4717.927 probe:submit_bio:(ffffffffac3aee00) bi_sector=0x4dc7a88) 4717.970 probe:submit_bio:(ffffffffac3aee00) bi_sector=0x4dc7880) ^C[root@jouet ~]# Now, using this new feature: [root@jouet ~]# perf probe -a 'submit_bio bio->bi_iter.bi_sector:s' Added new event: probe:submit_bio (on submit_bio with bi_sector=bio->bi_iter.bi_sector:s) You can now use it in all perf tools, such as: perf record -e probe:submit_bio -aR sleep 1 [root@jouet ~]# trace --no-syscalls --ev probe:submit_bio 0.000 probe:submit_bio:(ffffffffac3aee00) bi_sector=7145704) 0.017 probe:submit_bio:(ffffffffac3aee00) bi_sector=7145712) 0.019 probe:submit_bio:(ffffffffac3aee00) bi_sector=7145720) 2.567 probe:submit_bio:(ffffffffac3aee00) bi_sector=7145728) 5631.919 probe:submit_bio:(ffffffffac3aee00) bi_sector=0) 5631.941 probe:submit_bio:(ffffffffac3aee00) bi_sector=8) 5631.945 probe:submit_bio:(ffffffffac3aee00) bi_sector=16) 5631.948 probe:submit_bio:(ffffffffac3aee00) bi_sector=24) ^C# With callchains: # trace --no-syscalls --ev probe:submit_bio/max-stack=10/ 0.000 probe:submit_bio:(ffffffffac3aee00) bi_sector=50662544) submit_bio+0xa8200001 ([kernel.kallsyms]) submit_bh+0xa8200013 ([kernel.kallsyms]) jbd2_journal_commit_transaction+0xa8200691 ([kernel.kallsyms]) kjournald2+0xa82000ca ([kernel.kallsyms]) kthread+0xa82000d8 ([kernel.kallsyms]) ret_from_fork+0xa820001f ([kernel.kallsyms]) 0.023 probe:submit_bio:(ffffffffac3aee00) bi_sector=50662552) submit_bio+0xa8200001 ([kernel.kallsyms]) submit_bh+0xa8200013 ([kernel.kallsyms]) jbd2_journal_commit_transaction+0xa8200691 ([kernel.kallsyms]) kjournald2+0xa82000ca ([kernel.kallsyms]) kthread+0xa82000d8 ([kernel.kallsyms]) ret_from_fork+0xa820001f ([kernel.kallsyms]) 0.027 probe:submit_bio:(ffffffffac3aee00) bi_sector=50662560) submit_bio+0xa8200001 ([kernel.kallsyms]) submit_bh+0xa8200013 ([kernel.kallsyms]) jbd2_journal_commit_transaction+0xa8200691 ([kernel.kallsyms]) kjournald2+0xa82000ca ([kernel.kallsyms]) kthread+0xa82000d8 ([kernel.kallsyms]) ret_from_fork+0xa820001f ([kernel.kallsyms]) 2.593 probe:submit_bio:(ffffffffac3aee00) bi_sector=50662568) submit_bio+0xa8200001 ([kernel.kallsyms]) submit_bh+0xa8200013 ([kernel.kallsyms]) journal_submit_commit_record+0xa82001ac ([kernel.kallsyms]) jbd2_journal_commit_transaction+0xa82012e8 ([kernel.kallsyms]) kjournald2+0xa82000ca ([kernel.kallsyms]) kthread+0xa82000d8 ([kernel.kallsyms]) ret_from_fork+0xa820001f ([kernel.kallsyms]) ^C# Signed-off-by: Naohiro Aota <naohiro.aota@hgst.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1470710408-23515-1-git-send-email-naohiro.aota@hgst.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-09perf stat: Avoid skew when reading eventsMark Rutland
When we don't have a tracee (i.e. we're attaching to a task or CPU), counters can still be running after our workload finishes, and can still be running as we read their values. As we read events one-by-one, there can be arbitrary skew between values of events, even within a group. This means that ratios within an event group are not reliable. This skew can be seen if measuring a group of identical events, e.g: # perf stat -a -C0 -e '{cycles,cycles}' sleep 1 To avoid this, we must stop groups from counting before we read the values of any constituent events. This patch adds and makes use of a new disable_counters() helper, which disables group leaders (and thus each group as a whole). This mirrors the use of enable_counters() for starting event groups in the absence of a tracee. Closing a group leader splits the group, and without a disabled group leader the newly split events will begin counting. Thus to ensure counts are reliable we must defer closing group leaders until all counts have been read. To do so this patch removes the event closing logic from the read_counters() helper, explicitly closes the events using perf_evlist__close(), which also aids legibility. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1470747869-3567-1-git-send-email-mark.rutland@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-09perf probe: Fix module name matchingKonstantin Khlebnikov
If module is "module" then dso->short_name is "[module]". Substring comparing is't enough: "raid10" matches to "[raid1]". This patch also checks terminating zero in module name. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Link: http://lkml.kernel.org/r/147039975648.715620.12985971832789032159.stgit@buzz Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-09perf probe: Adjust map->reloc offset when finding kernel symbol from mapMasami Hiramatsu
Adjust map->reloc offset for the unmapped address when finding alternative symbol address from map, because KASLR can relocate the kernel symbol address. The same adjustment has been done when finding appropriate kernel symbol address from map which was introduced by commit f90acac75713 ("perf probe: Find given address from offline dwarf") Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: Masami Hiramatsu <masami.hiramatsu@linaro.org> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20160806192948.e366f3fbc4b194de600f8326@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-09perf hists: Trim libtraceevent trace_seq buffersArnaldo Carvalho de Melo
When we use libtraceevent to format trace event fields into printable strings to use in hist entries it is important to trim it from the default 4 KiB it starts with to what is really used, to reduce the memory footprint, so use realloc(seq.buffer, seq.len + 1) when returning the seq.buffer formatted with the fields contents. Reported-and-Tested-by: Wang Nan <wangnan0@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/n/tip-t3hl7uxmilrkigzmc90rlhk2@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-09perf script: Add 'bpf-output' field to usage messageBrendan Gregg
This adds the 'bpf-output' field to the perf script usage message, and docs. Signed-off-by: Brendan Gregg <bgregg@netflix.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1470192469-11910-4-git-send-email-bgregg@netflix.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-04Merge tag 'perf-core-for-mingo-20160803' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: New features: - Add --sample-cpu to 'perf record', to explicitely ask for sampling the CPU (Jiri Olsa) Fixes: - Fix processing of multi byte chunks in objdump output, fixing disassemble processing for annotation on at least ARM64 (Jan Stancek) - Use SyS_epoll_wait in a BPF 'perf test' entry instead of sys_epoll_wait, that is not present in the DWARF info in vmlinux files (Arnaldo Carvalho de Melo) - Add -wno-shadow when processing files using perl headers, fixing the build on Fedora Rawhide and Arch Linux (Namhyung Kim) Infrastructure changes: - Annotate prep work to better catch and report errors related to using objdump to disassemble DSOs (Arnaldo Carvalho de Melo) - Add 'alloc', 'scnprintf' and 'and' methods for bitmap processing (Jiri Olsa) - Add nested output resorting callback in hists processing (Jiri Olsa) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-03perf tests bpf: Use SyS_epoll_wait aliasArnaldo Carvalho de Melo
Something made the sys_epoll_wait() function alias not to be found in the vmlinux DWARF info, being found only in /proc/kallsyms, which made the BPF perf tests to fail: [root@jouet ~]# perf test BPF 37: Test BPF filter : 37.1: Test basic BPF filtering : FAILED! 37.2: Test BPF prologue generation : Skip 37.3: Test BPF relocation checker : Skip [root@jouet ~]# Using -v we can see it is failing to find DWARF info for the probed function, sys_epoll_wait, which we can find in /proc/kallsyms but not in vmlinux with CONFIG_DEBUG_INFO: [root@jouet ~]# grep -w sys_epoll_wait /proc/kallsyms ffffffffbd295b50 T sys_epoll_wait [root@jouet ~]# [root@jouet ~]# readelf -wi /lib/modules/4.7.0+/build/vmlinux | grep -w sys_epoll_wait [root@jouet ~]# If we try to use perf probe: [root@jouet ~]# perf probe sys_epoll_wait Failed to find debug information for address ffffffffbd295b50 Probe point 'sys_epoll_wait' not found. Error: Failed to add events. [root@jouet ~]# It all works if we use SyS_epoll_wait, that is just an alias to the probed function: [root@jouet ~]# grep -i sys_epoll_wait /proc/kallsyms ffffffffbd295b50 T SyS_epoll_wait ffffffffbd295b50 T sys_epoll_wait [root@jouet ~]# So use it: [root@jouet ~]# perf test BPF 37: Test BPF filter : 37.1: Test basic BPF filtering : Ok 37.2: Test BPF prologue generation : Ok 37.3: Test BPF relocation checker : Ok [root@jouet ~]# Further info: [root@jouet ~]# gcc --version gcc (GCC) 6.1.1 20160621 (Red Hat 6.1.1-3) [acme@jouet linux]$ cat /etc/fedora-release Fedora release 24 (Twenty Four) Investigation as to why it fails is still underway, but it was always going from sys_epoll_wait to SyS_epoll_wait when looking up the DWARF info in vmlinux, and this is what is breaking now. Switching to use SyS_epoll_wait allows this test to proceed and test the BPF code it was designed for, so lets have this in to allow passing this test while we fix the root cause. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-7hekjp0bodwjbb419sl2b55h@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-02perf tests: objdump output can contain multi byte chunksJan Stancek
objdump's raw insn output can vary across architectures on the number of bytes per chunk (bpc) displayed and their endianness. The code-reading test relied on reading objdump output as 1 bpc. Kaixu Xia reported test failure on ARM64, where objdump displays 4 bpc: 70c48: f90027bf str xzr, [x29,#72] 70c4c: 91224000 add x0, x0, #0x890 70c50: f90023a0 str x0, [x29,#64] This patch adds support to read raw insn output for any bpc length. In case of 2+ bpc it also guesses objdump's display endian. Reported-and-Tested-by: Kaixu Xia <xiakaixu@huawei.com> Signed-off-by: Jan Stancek <jstancek@redhat.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/07f0f7bcbda78deb423298708ef9b6a54d6b92bd.1452592712.git.jstancek@redhat.com [ Fix up pr_fmt() call to use %zd for size_t variables, fixing the build on Ubuntu cross-compiling to armhf and ppc64 ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-02perf record: Add --sample-cpu optionJiri Olsa
Adding --sample-cpu option to be able to explicitly enable CPU sample type. Currently it's only enable implicitly in case the target is cpu related. It will be useful for following c2c record tool. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1470074555-24889-8-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-02perf hists: Introduce output_resort_cb methodJiri Olsa
When dealing with nested hist entries it's helpful to have a way to resort those nested objects. Adding optional callback call into output_resort function and following new interface function: typedef int (*hists__resort_cb_t)(struct hist_entry *he); void hists__output_resort_cb(struct hists *hists, struct ui_progress *prog, hists__resort_cb_t cb); Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1470074555-24889-7-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-02perf tools: Move config/Makefile into Makefile.configJiri Olsa
There's no reason to keep it in separate directory now when we moved out the rest of the files. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1470074555-24889-6-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-02perf tests: Add test for bitmap_scnprintf functionJiri Olsa
Automatically test the bitmap_scnprintf function. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1470074555-24889-5-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-02perf tools: Fix build failure on perl script contextNamhyung Kim
On my Archlinux machine, perf faild to build like below: CC scripts/perl/Perf-Trace-Util/Context.o In file included from /usr/lib/perl5/core/perl/CORE/perl.h:3905:0, from Context.xs:23: /usr/lib/perl5/core/perl/CORE/inline.h: In function : /usr/lib/perl5/core/perl/CORE/cop.h:612:13: warning: declaration of 'av' shadows a previous local [-Werror-shadow] AV *av =3D GvAV(PL_defgv); ^ /usr/lib/perl5/core/perl/CORE/inline.h:526:5: note: in expansion of macro 'CX_POP_SAVEARRAY' CX_POP_SAVEARRAY(cx); ^~~~~~~~~~~~~~~~ In file included from /usr/lib/perl5/core/perl/CORE/perl.h:5853:0, from Context.xs:23: /usr/lib/perl5/core/perl/CORE/inline.h:518:9: note: shadowed declaration is here AV *av; ^~ What I did to fix is adding '-Wno-shadow' as the error message said it's the cause of the failure. Since it's from the perl (not perf) code base, we don't have the control so I just wanted to ignore the warning when compiling perl scripting code. Committer note: This also fixes the build on Fedora Rawhide. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20160802024317.31725-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-01perf annotate: Plug filename string leakArnaldo Carvalho de Melo
If dso__build_id_filename(..., NULL, ...) returns !NULL its because it allocated it, so, when reaching the 'if (dso__is_kcore()) test, we already checked that and were just "fallbacking" to using dso->long_name, but without freeing filename, thus leaking it. Fix it by adding the dso__is_kcore() test to the 'or' group just after it, the one containing the full fallback code, including freeing the filename. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Fixes: ee205503f233 ("perf tools: Fix annotation with kcore") Link: http://lkml.kernel.org/n/tip-qi4rpjq8yo6myvg99kkgt0xz@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-01perf annotate: Introduce strerror for handling symbol__disassemble() errorsArnaldo Carvalho de Melo
We were just using pr_error() which makes it difficult for non stdio UIs to provide errors using its widgets, as they need to somehow catch what was passed to pr_error(). Fix it by introducing a __strerror() interface like the ones used elsewhere, for instance target__strerror(). This is just the initial step, more work will be done, but first some error handling bugs noticed while working on this need to be dealt with. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-dgd22zl2xg7x4vcnoa83jxfb@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-01perf annotate: Rename symbol__annotate() to symbol__disassemble()Arnaldo Carvalho de Melo
This function will not annotate anything, it will just disassembly the given map->dso and symbol. It currently does this by parsing the output of 'objdump --disassemble', but this could conceivably be done using a library or an offshot of the kernel's instruction decoder (arch/x86/lib/inat.c), etc. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-2xpfl4bfnrd6x584b390qok7@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-30Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "This update contains: - a fix for the bpf tools to use the new EM_BPF code - a fix for the module parser of perf to retrieve the proper text start address - add str_error_c to libapi to avoid linking against tools/lib/str_error_r.o" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tools lib api: Add str_error_c to libapi perf s390: Fix 'start' address of module's map tools lib bpf: Use official ELF e_machine value
2016-07-29perf target: str_error_r() always returns the buffer it receivesArnaldo Carvalho de Melo
So no need for checking if it uses the strerror_r() GNU variant error reporting mechanism, i.e. if it returns a pointer to a immutable string internal to glibc. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Fixes: c8b5f2c96d1b ("tools: Introduce str_error_r()") Link: http://lkml.kernel.org/n/tip-xr83cd4y4r3cn6tq6w4f59jb@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-29perf annotate: Use pipe + fork instead of popenArnaldo Carvalho de Melo
We will need to redirect the stderr as well, so open code popen as a starting point. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-k0zt9svg4bswiglem7ornts4@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-28Merge tag 'libnvdimm-for-4.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: - Replace pcommit with ADR / directed-flushing. The pcommit instruction, which has not shipped on any product, is deprecated. Instead, the requirement is that platforms implement either ADR, or provide one or more flush addresses per nvdimm. ADR (Asynchronous DRAM Refresh) flushes data in posted write buffers to the memory controller on a power-fail event. Flush addresses are defined in ACPI 6.x as an NVDIMM Firmware Interface Table (NFIT) sub-structure: "Flush Hint Address Structure". A flush hint is an mmio address that when written and fenced assures that all previous posted writes targeting a given dimm have been flushed to media. - On-demand ARS (address range scrub). Linux uses the results of the ACPI ARS commands to track bad blocks in pmem devices. When latent errors are detected we re-scrub the media to refresh the bad block list, userspace can also request a re-scrub at any time. - Support for the Microsoft DSM (device specific method) command format. - Support for EDK2/OVMF virtual disk device memory ranges. - Various fixes and cleanups across the subsystem. * tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (41 commits) libnvdimm-btt: Delete an unnecessary check before the function call "__nd_device_register" nfit: do an ARS scrub on hitting a latent media error nfit: move to nfit/ sub-directory nfit, libnvdimm: allow an ARS scrub to be triggered on demand libnvdimm: register nvdimm_bus devices with an nd_bus driver pmem: clarify a debug print in pmem_clear_poison x86/insn: remove pcommit Revert "KVM: x86: add pcommit support" nfit, tools/testing/nvdimm/: unify shutdown paths libnvdimm: move ->module to struct nvdimm_bus_descriptor nfit: cleanup acpi_nfit_init calling convention nfit: fix _FIT evaluation memory leak + use after free tools/testing/nvdimm: add manufacturing_{date|location} dimm properties tools/testing/nvdimm: add virtual ramdisk range acpi, nfit: treat virtual ramdisk SPA as pmem region pmem: kill __pmem address space pmem: kill wmb_pmem() libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes fs/dax: remove wmb_pmem() libnvdimm, pmem: flush posted-write queues on shutdown ...
2016-07-28mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocationsVlastimil Babka
After the previous patch, we can distinguish costly allocations that should be really lightweight, such as THP page faults, with __GFP_NORETRY. This means we don't need to recognize khugepaged allocations via PF_KTHREAD anymore. We can also change THP page faults in areas where madvise(MADV_HUGEPAGE) was used to try as hard as khugepaged, as the process has indicated that it benefits from THP's and is willing to pay some initial latency costs. We can also make the flags handling less cryptic by distinguishing GFP_TRANSHUGE_LIGHT (no reclaim at all, default mode in page fault) from GFP_TRANSHUGE (only direct reclaim, khugepaged default). Adding __GFP_NORETRY or __GFP_KSWAPD_RECLAIM is done where needed. The patch effectively changes the current GFP_TRANSHUGE users as follows: * get_huge_zero_page() - the zero page lifetime should be relatively long and it's shared by multiple users, so it's worth spending some effort on it. We use GFP_TRANSHUGE, and __GFP_NORETRY is not added. This also restores direct reclaim to this allocation, which was unintentionally removed by commit e4a49efe4e7e ("mm: thp: set THP defrag by default to madvise and add a stall-free defrag option") * alloc_hugepage_khugepaged_gfpmask() - this is khugepaged, so latency is not an issue. So if khugepaged "defrag" is enabled (the default), do reclaim via GFP_TRANSHUGE without __GFP_NORETRY. We can remove the PF_KTHREAD check from page alloc. As a side-effect, khugepaged will now no longer check if the initial compaction was deferred or contended. This is OK, as khugepaged sleep times between collapsion attempts are long enough to prevent noticeable disruption, so we should allow it to spend some effort. * migrate_misplaced_transhuge_page() - already was masking out __GFP_RECLAIM, so just convert to GFP_TRANSHUGE_LIGHT which is equivalent. * alloc_hugepage_direct_gfpmask() - vma's with VM_HUGEPAGE (via madvise) are now allocating without __GFP_NORETRY. Other vma's keep using __GFP_NORETRY if direct reclaim/compaction is at all allowed (by default it's allowed only for madvised vma's). The rest is conversion to GFP_TRANSHUGE(_LIGHT). [mhocko@suse.com: suggested GFP_TRANSHUGE_LIGHT] Link: http://lkml.kernel.org/r/20160721073614.24395-7-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-28perf evsel: Introduce constructor for cycles eventArnaldo Carvalho de Melo
That is the default used when no events is specified in tools, separate it so that simpler tools that need no evlist can use it directly. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-67mwuthscwroz88x9pswcqyv@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-27tools lib api: Add str_error_c to libapiArnaldo Carvalho de Melo
Because it uses that function, which would lead every tool using it to need to link against tools/lib/str_error_r.o. This fixes building tools/vm/, that links with libapi. Reported-by: Arjan van de Ven <arjan@linux.intel.com> Reported-by: Randy Dunlap <rdunlap@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Fixes: b31e3e3316a7 ("tools lib api fs: Use str_error_r()") Link: http://lkml.kernel.org/n/tip-aedt3qzibhnhaov2j4caqi61@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) Unified UDP encapsulation offload methods for drivers, from Alexander Duyck. 2) Make DSA binding more sane, from Andrew Lunn. 3) Support QCA9888 chips in ath10k, from Anilkumar Kolli. 4) Several workqueue usage cleanups, from Bhaktipriya Shridhar. 5) Add XDP (eXpress Data Path), essentially running BPF programs on RX packets as soon as the device sees them, with the option to mirror the packet on TX via the same interface. From Brenden Blanco and others. 6) Allow qdisc/class stats dumps to run lockless, from Eric Dumazet. 7) Add VLAN support to b53 and bcm_sf2, from Florian Fainelli. 8) Simplify netlink conntrack entry layout, from Florian Westphal. 9) Add ipv4 forwarding support to mlxsw spectrum driver, from Ido Schimmel, Yotam Gigi, and Jiri Pirko. 10) Add SKB array infrastructure and convert tun and macvtap over to it. From Michael S Tsirkin and Jason Wang. 11) Support qdisc packet injection in pktgen, from John Fastabend. 12) Add neighbour monitoring framework to TIPC, from Jon Paul Maloy. 13) Add NV congestion control support to TCP, from Lawrence Brakmo. 14) Add GSO support to SCTP, from Marcelo Ricardo Leitner. 15) Allow GRO and RPS to function on macsec devices, from Paolo Abeni. 16) Support MPLS over IPV4, from Simon Horman. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits) xgene: Fix build warning with ACPI disabled. be2net: perform temperature query in adapter regardless of its interface state l2tp: Correctly return -EBADF from pppol2tp_getname. net/mlx5_core/health: Remove deprecated create_singlethread_workqueue net: ipmr/ip6mr: update lastuse on entry change macsec: ensure rx_sa is set when validation is disabled tipc: dump monitor attributes tipc: add a function to get the bearer name tipc: get monitor threshold for the cluster tipc: make cluster size threshold for monitoring configurable tipc: introduce constants for tipc address validation net: neigh: disallow transition to NUD_STALE if lladdr is unchanged in neigh_update() MAINTAINERS: xgene: Add driver and documentation path Documentation: dtb: xgene: Add MDIO node dtb: xgene: Add MDIO node drivers: net: xgene: ethtool: Use phy_ethtool_gset and sset drivers: net: xgene: Use exported functions drivers: net: xgene: Enable MDIO driver drivers: net: xgene: Add backward compatibility drivers: net: phy: xgene: Add MDIO driver ...
2016-07-26perf s390: Fix 'start' address of module's mapSong Shan Gong
At present, when creating module's map, perf gets 'start' address by parsing '/proc/modules', but it's the module base address, it isn't the start address of the '.text' section. In most arches, it's OK. But for s390, it places 'GOT' and 'PLT' relocations before '.text' section. So there exists an offset between module base address and '.text' section, which will incur wrong symbol resolution for modules. Fix this bug by getting 'start' address of module's map from parsing '/sys/module/[module name]/sections/.text', not from '/proc/modules'. Signed-off-by: Song Shan Gong <gongss@linux.vnet.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: David Ahern <dsahern@gmail.com> Link: http://lkml.kernel.org/r/1469070651-6447-2-git-send-email-gongss@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-25Revert "perf tools: event.h needs asm/perf_regs.h"Arnaldo Carvalho de Melo
This reverts commit e083a21fcac9311ca425e600a15332f4792c56cc. Not needed at all, tools/perf/util/perf_regs.h, included via: #include "perf_regs.h" Should have a definition for PERF_REGS_MAX, and since this is dependent on HAVE_PERF_REGS_SUPPORT, fixes the build on powerpc, noticed by trying to cross compile this from ubuntu16.04 with a locally build libz & elfutils pair, since those are not available in multilib packages. Cc: Jiri Olsa <jolsa@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Link: http://lkml.kernel.org/n/tip-0bv204s71t4wuw1l53b6fz79@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-23x86/insn: remove pcommitDan Williams
The pcommit instruction is being deprecated in favor of either ADR (asynchronous DRAM refresh: flush-on-power-fail) at the platform level, or posted-write-queue flush addresses as defined by the ACPI 6.x NFIT (NVDIMM Firmware Interface Table). Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Acked-by: Ingo Molnar <mingo@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-22perf tests kmod-path: Fix build on ubuntu:16.04-x-armhfArnaldo Carvalho de Melo
Cross building it on Ubuntu 16.04 to ARM ends up showing we get the free() prototype by luck in other environments, fix it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-0ktfgmmyhcfw8ondka2013f3@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-21perf tools: Add AVX-512 instructions to the new instructions testAdrian Hunter
Previous patches added support for Intel's AVX-512 instructions to the kernel and perf tools instruction decoders. AVX-512 instructions are documented in Intel Architecture Instruction Set Extensions Programming Reference (February 2016). Add a representative set of instructions to perf's "new instructions" test. e.g. perf test "new instructions" Or to view a particular instruction: perf test -v "new instructions" 2>&1 | grep vbroadcasti64x4 Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dan Williams <dan.j.williams@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: X86 ML <x86@kernel.org> Link: http://lkml.kernel.org/r/1469003437-32706-5-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-21perf tools: Add AVX-512 support to the instruction decoder used by Intel PTAdrian Hunter
Add support for Intel's AVX-512 instructions to perf tools instruction decoder used by Intel PT. The kernel's instruction decoder was updated in a previous patch. AVX-512 instructions are documented in Intel Architecture Instruction Set Extensions Programming Reference (February 2016). AVX-512 instructions are identified by a EVEX prefix which, for the purpose of instruction decoding, can be treated as though it were a 4-byte VEX prefix. Existing instructions which can now accept an EVEX prefix need not be further annotated in the op code map (x86-opcode-map.txt). In the case of new instructions, the op code map is updated accordingly. Also add associated Mask Instructions that are used to manipulate mask registers used in AVX-512 instructions. A representative set of instructions is added to the perf tools new instructions test in a subsequent patch. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dan Williams <dan.j.williams@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: X86 ML <x86@kernel.org> Link: http://lkml.kernel.org/r/1469003437-32706-4-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-20x86/insn: perf tools: Fix vcvtph2ps instruction decodingAdrian Hunter
vcvtph2ps does not have an immediate operand, so remove the erroneous 'Ib' from its opcode map entry. Add vcvtph2ps to the perf tools new instructions test to verify it. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dan Williams <dan.j.williams@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: X86 ML <x86@kernel.org> Link: http://lkml.kernel.org/r/1469003437-32706-2-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-18perf tests: Add is_printable_array testJiri Olsa
Add automated test for is_printable_array function. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Pirko <jiri@mellanox.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1468685480-18951-4-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-18perf tools: Make is_printable_array globalJiri Olsa
It's used from 2 objects in perf, so it's better to keep just one copy. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Pirko <jiri@mellanox.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1468685480-18951-3-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-18perf script python: Fix string vs byte array resolvingJiri Olsa
Jirka reported that python code returns all arrays as strings. This makes impossible to get all items for byte array tracepoint field containing 0x00 value item. Fixing this by scanning full length of the array and returning it as PyByteArray object in case non printable byte is found. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Reported-and-Tested-by: Jiri Pirko <jiri@mellanox.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1468685480-18951-2-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-18perf probe: Warn unmatched function filter correctlyMasami Hiramatsu
Warn unmatched function filter correctly instead of warning "symbol-loading error", since that can be a filter issue. From the technical point of view, this adds a filter chech in map__load and if there is a filter, it returns -2 (filter-out), instead of -1 (error), and perf-probe checks it and change message. E.g. without this fix: # perf probe -F rt_sp* no symbols found in [kernel.kallsyms], maybe install a debug package? Failed to load symbols in kernel With this fix: # perf probe -F rt_sp* no symbols passed the given filter. Failed to find symbols matched to "rt_sp*" Error: Failed to show functions. Reported-and-Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/146885835596.16106.2293540792775552481.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>