aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/util/annotate.c
AgeCommit message (Collapse)Author
2019-04-16Merge tag 'v4.18.35' into v4.18/standard/baseBruce Ashfield
This is the 4.18.35 stable release
2019-04-12perf annotate: Fix getting source line failureWei Li
commit 11db1ad4513d6205d2519e1a30ff4cef746e3243 upstream. The output of "perf annotate -l --stdio xxx" changed since commit 425859ff0de33 ("perf annotate: No need to calculate notes->start twice") removed notes->start assignment in symbol__calc_lines(). It will get failed in find_address_in_section() from symbol__tty_annotate() subroutine as the a2l->addr is wrong. So the annotate summary doesn't report the line number of source code correctly. Before fix: liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ cat common_while_1.c void hotspot_1(void) { volatile int i; for (i = 0; i < 0x10000000; i++); for (i = 0; i < 0x10000000; i++); for (i = 0; i < 0x10000000; i++); } int main(void) { hotspot_1(); return 0; } liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ gcc common_while_1.c -g -o common_while_1 liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf record ./common_while_1 [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.488 MB perf.data (12498 samples) ] liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf annotate -l -s hotspot_1 --stdio Sorted summary for file /home/liwei/main_code/hulk_work/hulk/tools/perf/common_while_1 ---------------------------------------------- 19.30 common_while_1[32] 19.03 common_while_1[4e] 19.01 common_while_1[16] 5.04 common_while_1[13] 4.99 common_while_1[4b] 4.78 common_while_1[2c] 4.77 common_while_1[10] 4.66 common_while_1[2f] 4.59 common_while_1[51] 4.59 common_while_1[35] 4.52 common_while_1[19] 4.20 common_while_1[56] 0.51 common_while_1[48] Percent | Source code & Disassembly of common_while_1 for cycles:ppp (12480 samples, percent: local period) ----------------------------------------------------------------------------------------------------------------- : : : : Disassembly of section .text: : : 00000000000005fa <hotspot_1>: : hotspot_1(): : void hotspot_1(void) : { 0.00 : 5fa: push %rbp 0.00 : 5fb: mov %rsp,%rbp : volatile int i; : : for (i = 0; i < 0x10000000; i++); 0.00 : 5fe: movl $0x0,-0x4(%rbp) 0.00 : 605: jmp 610 <hotspot_1+0x16> 0.00 : 607: mov -0x4(%rbp),%eax common_while_1[10] 4.77 : 60a: add $0x1,%eax common_while_1[13] 5.04 : 60d: mov %eax,-0x4(%rbp) common_while_1[16] 19.01 : 610: mov -0x4(%rbp),%eax common_while_1[19] 4.52 : 613: cmp $0xfffffff,%eax 0.00 : 618: jle 607 <hotspot_1+0xd> : for (i = 0; i < 0x10000000; i++); ... After fix: liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf record ./common_while_1 [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.488 MB perf.data (12500 samples) ] liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf annotate -l -s hotspot_1 --stdio Sorted summary for file /home/liwei/main_code/hulk_work/hulk/tools/perf/common_while_1 ---------------------------------------------- 33.34 common_while_1.c:5 33.34 common_while_1.c:6 33.32 common_while_1.c:7 Percent | Source code & Disassembly of common_while_1 for cycles:ppp (12482 samples, percent: local period) ----------------------------------------------------------------------------------------------------------------- : : : : Disassembly of section .text: : : 00000000000005fa <hotspot_1>: : hotspot_1(): : void hotspot_1(void) : { 0.00 : 5fa: push %rbp 0.00 : 5fb: mov %rsp,%rbp : volatile int i; : : for (i = 0; i < 0x10000000; i++); 0.00 : 5fe: movl $0x0,-0x4(%rbp) 0.00 : 605: jmp 610 <hotspot_1+0x16> 0.00 : 607: mov -0x4(%rbp),%eax common_while_1.c:5 4.70 : 60a: add $0x1,%eax 4.89 : 60d: mov %eax,-0x4(%rbp) common_while_1.c:5 19.03 : 610: mov -0x4(%rbp),%eax common_while_1.c:5 4.72 : 613: cmp $0xfffffff,%eax 0.00 : 618: jle 607 <hotspot_1+0xd> : for (i = 0; i < 0x10000000; i++); 0.00 : 61a: movl $0x0,-0x4(%rbp) 0.00 : 621: jmp 62c <hotspot_1+0x32> 0.00 : 623: mov -0x4(%rbp),%eax common_while_1.c:6 4.54 : 626: add $0x1,%eax 4.73 : 629: mov %eax,-0x4(%rbp) common_while_1.c:6 19.54 : 62c: mov -0x4(%rbp),%eax common_while_1.c:6 4.54 : 62f: cmp $0xfffffff,%eax ... Signed-off-by: Wei Li <liwei391@huawei.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: 425859ff0de33 ("perf annotate: No need to calculate notes->start twice") Link: http://lkml.kernel.org/r/20190221095716.39529-1-liwei391@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2018-10-15Merge tag 'v4.18.13' into v4.18/standard/baseBruce Ashfield
This is the 4.18.13 stable release # gpg: Signature made Wed 10 Oct 2018 02:56:08 AM EDT using RSA key ID 6092693E # gpg: Can't check signature: public key not found
2018-10-10perf annotate: Fix parsing aarch64 branch instructions after objdump updateKim Phillips
[ Upstream commit 4e67b2a5df5d3f341776d12ee575e00ca3ef92de ] Starting with binutils 2.28, aarch64 objdump adds comments to the disassembly output to show the alternative names of a condition code [1]. It is assumed that commas in objdump comments could occur in other arches now or in the future, so this fix is arch-independent. The fix could have been done with arm64 specific jump__parse and jump__scnprintf functions, but the jump__scnprintf instruction would have to have its comment character be a literal, since the scnprintf functions cannot receive a struct arch easily. This inconvenience also applies to the generic jump__scnprintf, which is why we add a raw_comment pointer to struct ins_operands, so the __parse function assigns it to be re-used by its corresponding __scnprintf function. Example differences in 'perf annotate --stdio2' output on an aarch64 perf.data file: BEFORE: → b.cs ffff200008133d1c <unwind_frame+0x18c> // b.hs, dffff7ecc47b AFTER : ↓ b.cs 18c BEFORE: → b.cc ffff200008d8d9cc <get_alloc_profile+0x31c> // b.lo, b.ul, dffff727295b AFTER : ↓ b.cc 31c The branch target labels 18c and 31c also now appear in the output: BEFORE: add x26, x29, #0x80 AFTER : 18c: add x26, x29, #0x80 BEFORE: add x21, x21, #0x8 AFTER : 31c: add x21, x21, #0x8 The Fixes: tag below is added so stable branches will get the update; it doesn't necessarily mean that commit was broken at the time, rather it didn't withstand the aarch64 objdump update. Tested no difference in output for sample x86_64, power arch perf.data files. [1] https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=bb7eff5206e4795ac79c177a80fe9f4630aaf730 Signed-off-by: Kim Phillips <kim.phillips@arm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anton Blanchard <anton@samba.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: linux-arm-kernel@lists.infradead.org Fixes: b13bbeee5ee6 ("perf annotate: Fix branch instruction with multiple operands") Link: http://lkml.kernel.org/r/20180827125340.a2f7e291901d17cea05daba4@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-10perf annotate: Properly interpret indirect callMartin Liška
[ Upstream commit 1dc27f63303db58ce1b1a6932d1825305f86d574 ] The patch changes the parsing of: callq *0x8(%rbx) from: 0.26 │ → callq *8 to: 0.26 │ → callq *0x8(%rbx) in this case an address is followed by a register, thus one can't parse only the address. Committer testing: 1) run 'perf record sleep 10' 2) before applying the patch, run: perf annotate --stdio2 > /tmp/before 3) after applying the patch, run: perf annotate --stdio2 > /tmp/after 4) diff /tmp/before /tmp/after: # --- /tmp/before 2018-08-28 11:16:03.238384143 -0300 # +++ /tmp/after 2018-08-28 11:15:39.335341042 -0300 # @@ -13274,7 +13274,7 @@ # ↓ jle 128 # hash_value = hash_table->hash_func (key); # mov 0x8(%rsp),%rdi # - 0.91 → callq *30 # + 0.91 → callq *0x30(%r12) # mov $0x2,%r8d # cmp $0x2,%eax # node_hash = hash_table->hashes[node_index]; # @@ -13848,7 +13848,7 @@ # mov %r14,%rdi # sub %rbx,%r13 # mov %r13,%rdx # - → callq *38 # + → callq *0x38(%r15) # cmp %rax,%r13 # 1.91 ↓ je 240 # 1b4: mov $0xffffffff,%r13d # @@ -14026,7 +14026,7 @@ # mov %rcx,-0x500(%rbp) # mov %r15,%rsi # mov %r14,%rdi # - → callq *38 # + → callq *0x38(%rax) # mov -0x500(%rbp),%rcx # cmp %rax,%rcx # ↓ jne 9b0 <SNIP tons of other such cases> Signed-off-by: Martin Liška <mliska@suse.cz> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Kim Phillips <kim.phillips@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/bd1f3932-be2b-85f9-7582-111ee0a43b07@suse.cz Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-22perf annotate: replace 'expand' with equivalent sed expressionTom Zanussi
We don't have 'expand' in our userspace so we need to accomplish the same thing using 'sed', which we do have. Signed-off-by: Tom Zanussi <tom.zanussi@intel.com>
2018-06-06perf annnotate: Make __symbol__inc_addr_samples handle src->histograms == NULLArnaldo Carvalho de Melo
Making it a bit more robust, this took place here when a sample appeared right after: ffffffff8a925000 D __nosave_end And before the next considered symbol, which, using kallsyms make us over guess the size of __nosave_end, and then the sequence: hist_entry__inc_addr_samples -> symbol__inc_addr_samples -> symbol__hists -> annotated_source__alloc_histograms Ends up not liking to allocate gigabytes of ram for annotation... This will be alleviated by considering BSS symbols, which we should but don't so far, and then we should investigate those samples further. The testcase was to have: perf top -e cycles/call-graph=fp/,cache-misses/call-graph=dwarf/,instructions Running for a while till it segfaulted trying to access NULL notes->src->histograms. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-ndfjtpiop3tdcnyjgp320ra8@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-04perf annotate: Move objdump_path to struct annotation_optionsArnaldo Carvalho de Melo
One more step in grouping annotation options. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-sogzdhugoavm6fyw60jnb0vs@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-04perf annotate: Move disassembler_style global to annotation_optionsArnaldo Carvalho de Melo
Continuing to group annotation specific stuff into a struct. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-p3cdhltj58jt0byjzg3g7obx@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-04perf annotate: Adopt anotation options from symbol_confArnaldo Carvalho de Melo
Continuing to group annotation options in an annotation specific struct. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-astei92tzxp4yccag5pxb2h7@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-04perf annotate: Pass annotation_options to symbol__annotate()Arnaldo Carvalho de Melo
Now all callers to symbol__disassemble() can hand it the per-tool annotation_options, which will allow us to remove lots of stuff from symbol_options, the kitchen sink of perf configs, reducing its size and getting annotation specific stuff grouped together. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-vpr7ys7ggvs2fzpg8wbjcw7e@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-04perf annotate stdio: Use annotation_options consistentlyArnaldo Carvalho de Melo
Accross all the routines, this way we can have eventually have a consistent set of defaults for all UIs. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-6qgtixurjgdk5u0n3rw78ges@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-04perf annotate: Replace symbol__alloc_hists() with symbol__hists()Arnaldo Carvalho de Melo
Its a bit shorter, so ditch the old symbol__alloc_hists() function. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-m7tienxk7dijh5ln62yln1m9@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-04perf annotate: Stop using symbol_conf.nr_events global in symbol__hists()Arnaldo Carvalho de Melo
Since now we have evsel->evlist->nr_entries in the single place calling this function, use it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-9mgosbqa977h39j4i9ys8t75@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-04perf annotate: Introduce symbol__cycle_hists()Arnaldo Carvalho de Melo
In this case we're wanting just notes->src->cycles_hist, allocating it if needed. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-pqj81aneunhftlntm66tmhz0@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-04perf annotate: Introduce symbol__hists()Arnaldo Carvalho de Melo
In this case we're wanting just notes->src->histograms, allocating it if needed. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-4iatualjskia7sojmdb65cmm@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-04perf annotate: __symbol__inc_addr_samples() needs just annotated_sourceArnaldo Carvalho de Melo
It only operates on the histograms, so no need for the encompassing 'struct annotation'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-2se2v7rrjil0kwqywks04ey2@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-04perf annotate: Introduce annotated_source__alloc_histogramsArnaldo Carvalho de Melo
So that we can call it independently, in contexts were we know we already have notes->src allocated. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-f5fn7tr1asey6g013wavpn4c@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-04perf annotate: Introduce constructor/destructor for annotated_sourceArnaldo Carvalho de Melo
More stuff will go in there, all the parts that are not needed when a symbol had no samples and that were mistakenly added to 'struct annotation'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-u4761kyzhixw9ydk6kib3f0o@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-04perf annotate: Split allocation of annotated_source structArnaldo Carvalho de Melo
So that we can allocate just the notes->src->cyc_hist, that, unlike notes->src->histograms, is not per event, and in paths where we need to lazily allocate notes->src->cyc_hist we don't have the number of events handy to also allocate ->histograms. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-tsx7dhxzpi0criyx0sio3pz3@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-04perf annotate: __symbol__acount_cycles doesn't need notesArnaldo Carvalho de Melo
It only operates on the notes->src->cyc_hist, just pass that to it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-zd1cu4zwmu21k0cxlr83y6vr@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-04perf annotate: Pass perf_evsel instead of just evsel->idxArnaldo Carvalho de Melo
The code gets shorter and we'll be able to use evsel->evlist in a followup patch. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-t0s7vy19wq5kak74kavm8swf@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-23perf annotate: Show group event string for stdioJin Yao
When we enable the group, for tui/stdio2, the output first line includes the group event string. While for stdio, it will show only one event. For example, perf record -e cycles,branches ./div perf annotate --group --stdio Percent | Source code & Disassembly of div for cycles (44407 samples) ...... The first line doesn't include the event 'branches'. With this patch, it will show the correct group even string. perf annotate --group --stdio Percent | Source code & Disassembly of div for cycles, branches (44407 samples) ...... Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1526989115-14435-1-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-19perf annotate: Create hotkey 'c' to show min/max cyclesJin Yao
In the 'perf annotate' view, a new hotkey 'c' is created for showing the min/max cycles. For example, when press 'c', the annotate view is: Percent│ IPC Cycle(min/max) │ │ │ Disassembly of section .text: │ │ 000000000003aab0 <random@@GLIBC_2.2.5>: 8.22 │3.92 sub $0x18,%rsp │3.92 mov $0x1,%esi │3.92 xor %eax,%eax │3.92 cmpl $0x0,argp_program_version_hook@@G │3.92 1(2/1) ↓ je 20 │ lock cmpxchg %esi,__abort_msg@@GLIBC_P │ ↓ jne 29 │ ↓ jmp 43 │1.10 20: cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+ 8.93 │1.10 1(5/1) ↓ je 43 When press 'c' again, the annotate view is switched back: Percent│ IPC Cycle │ │ │ Disassembly of section .text: │ │ 000000000003aab0 <random@@GLIBC_2.2.5>: 8.22 │3.92 sub $0x18,%rsp │3.92 mov $0x1,%esi │3.92 xor %eax,%eax │3.92 cmpl $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x │3.92 1 ↓ je 20 │ lock cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0 │ ↓ jne 29 │ ↓ jmp 43 │1.10 20: cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0 8.93 │1.10 1 ↓ je 43 Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1526569118-14217-3-git-send-email-yao.jin@linux.intel.com [ Rename all maxmin to minmax ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-18perf annotate: Record the min/max cyclesJin Yao
Currently perf has a feature to account cycles for LBRs For example, on skylake: perf record -b ... perf report or perf annotate And then browsing the annotate browser gives average cycle counts for program blocks. For some analysis it would be useful if we could know not only the average cycles but also the min and max cycles. This patch records the min and max cycles. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1526569118-14217-2-git-send-email-yao.jin@linux.intel.com [ Switch from max/min to min/max ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-10perf annotate: Display all available events on --stdioJin Yao
When we perform the following command lines: $ perf record -e "{cycles,branches}" ./div $ perf annotate main --stdio The output shows only the first event, "cycles" and the displaying format is not correct. Percent | Source code & Disassembly of div for cycles (44550 samples) ----------------------------------------------------------------------------------- : : : : Disassembly of section .text: : : 00000000004004b0 <main>: : main(): : : return i; : } : : int main(void) : { 0.00 : 4004b0: push %rbx : int i; : int flag; : volatile double x = 1212121212, y = 121212; : : s_randseed = time(0); 0.00 : 4004b1: xor %edi,%edi : srand(s_randseed); 0.00 : 4004b3: mov $0x77359400,%ebx : : return i; : } The issue is that the value of the 'nr_percent' variable is hardcoded to 1. This patch fixes it. With this patch, the output is: Percent | Source code & Disassembly of div for cycles (44550 samples) ----------------------------------------------------------------------------------- : : : : Disassembly of section .text: : : 00000000004004b0 <main>: : main(): : : return i; : } : : int main(void) : { 0.00 0.00 : 4004b0: push %rbx : int i; : int flag; : volatile double x = 1212121212, y = 121212; : : s_randseed = time(0); 0.00 0.00 : 4004b1: xor %edi,%edi : srand(s_randseed); 0.00 0.00 : 4004b3: mov $0x77359400,%ebx : : return i; : } Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: f681d593d1ce ("perf annotate: Remove disasm__calc_percent() from disasm_line__print()") Link: http://lkml.kernel.org/r/1525881435-4092-1-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-13perf annotate: Allow setting the offset level in .perfconfigArnaldo Carvalho de Melo
The default is 1 (jump_target): # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave Samples: 3K of event 'cycles:ppp', 3000 Hz, Event count (approx.): 2766398574 _raw_spin_lock_irqsave() /proc/kcore 0.26 nop 4.61 push %rbx 19.33 pushfq 7.97 pop %rax 0.32 nop 0.06 mov %rax,%rbx 14.63 cli 0.06 nop xor %eax,%eax mov $0x1,%edx 49.94 lock cmpxchg %edx,(%rdi) 0.16 test %eax,%eax ↓ jne 2b 2.66 mov %rbx,%rax pop %rbx ← retq 2b: mov %eax,%esi → callq *ffffffffb30eaed0 mov %rbx,%rax pop %rbx ← retq # But one can ask for showing offsets for call instructions by setting this: # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave Samples: 3K of event 'cycles:ppp', 3000 Hz, Event count (approx.): 2766398574 _raw_spin_lock_irqsave() /proc/kcore 0.26 nop 4.61 push %rbx 19.33 pushfq 7.97 pop %rax 0.32 nop 0.06 mov %rax,%rbx 14.63 cli 0.06 nop xor %eax,%eax mov $0x1,%edx 49.94 lock cmpxchg %edx,(%rdi) 0.16 test %eax,%eax ↓ jne 2b 2.66 mov %rbx,%rax pop %rbx ← retq 2b: mov %eax,%esi 2d: → callq *ffffffffb30eaed0 mov %rbx,%rax pop %rbx ← retq # Or using a big value to ask for all offsets to be shown: # cat ~/.perfconfig [annotate] offset_level = 100 hide_src_code = true # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave Samples: 3K of event 'cycles:ppp', 3000 Hz, Event count (approx.): 2766398574 _raw_spin_lock_irqsave() /proc/kcore 0.26 0: nop 4.61 5: push %rbx 19.33 6: pushfq 7.97 7: pop %rax 0.32 8: nop 0.06 d: mov %rax,%rbx 14.63 10: cli 0.06 11: nop 17: xor %eax,%eax 19: mov $0x1,%edx 49.94 1e: lock cmpxchg %edx,(%rdi) 0.16 22: test %eax,%eax 24: ↓ jne 2b 2.66 26: mov %rbx,%rax 29: pop %rbx 2a: ← retq 2b: mov %eax,%esi 2d: → callq *ffffffffb30eaed0 32: mov %rbx,%rax 35: pop %rbx 36: ← retq # This also affects the TUI, i.e. the default 'perf annotate' and 'perf top/report' -> A hotkey -> annotate interfaces, when slang-devel is present in the build, i.e.: # perf version --build-options | grep slang libslang: [ on ] # HAVE_SLANG_SUPPORT # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Martin Liška <mliska@suse.cz> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Thomas Richter <tmricht@linux.vnet.ibm.com> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-venm6x5zrt40eu8hxdsmqxz6@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-12perf annotate: Allow showing offsets in more than just jump targetsArnaldo Carvalho de Melo
Jesper wanted to see offsets at callq sites when doing some performance investigation related to retpolines, so save him some time by providing an 'struct annotation_options' to control where offsets should appear: just on jump targets? That + call instructions? All? This puts in place the logic to show the offsets, now we need to wire this up in the TUI browser (next patch) and on the 'perf annotate --stdio2" interface, where we need a more general mechanism to setup the 'annotation_options' struct from the command line. Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Liška <mliska@suse.cz> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Thomas Richter <tmricht@linux.vnet.ibm.com> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-m3jc9c3swobye9tj08gnh5i7@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-05perf annotate: Show group details on the title lineArnaldo Carvalho de Melo
To match what is shown in the main 'perf report/top' title lines, i.e. if a group is being shown, either a real group (recorded with "-e '{a,b,c}') or a forced group (using 'perf report --group' for a perf.data file recorded without {}) we will show multiple columns, one per event, but we were failing to show the group details, so, for: # perf report --header-only | grep cmdline # cmdline : /home/acme/bin/perf record -e {cycles,instructions,cache-misses} # perf report --group The first line was showing just "cycles", now it shows the correct line, which is: Samples: 578 of events 'anon group { cycles, instructions, cache-misses }', 4000 Hz, Event count (approx.): 487421794 syscall_return_via_sysret /lib/modules/4.16.0-rc7/build/vmlinux 0.22 2.97 0.00 │ ↓ jmp 6c │ mov %cr3,%rdi 1.33 10.89 4.00 │ ↓ jmp 62 │ mov %rdi,%rax <SNIP> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Fixes: 6920e2854e9a ("perf annotate browser: Show extra title line with event information") Link: https://lkml.kernel.org/n/tip-i41tqh17c2dabnyzjh99r1oz@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-03perf annotate stdio2: Print more descriptive event information headerArnaldo Carvalho de Melo
To match the recently added event header information to --tui, e.g.: # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave Samples: 128 of event 'cycles:ppp', 4000 Hz, Event count (approx.): 48617682 _raw_spin_lock_irqsave() /proc/kcore 0.78 nop 7.03 push %rbx 3.12 pushfq 6.25 pop %rax nop mov %rax,%rbx 3.12 cli nop xor %eax,%eax mov $0x1,%edx 79.69 lock cmpxchg %edx,(%rdi) test %eax,%eax ↓ jne 2b mov %rbx,%rax pop %rbx ← retq 2b: mov %eax,%esi → callq *ffffffffb30eaed0 mov %rbx,%rax pop %rbx ← retq # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Martin Liška <mliska@suse.cz> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-ujy46x7cldyhyxelyf2b9quy@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-03perf annotate: Introduce annotation__scnprintf_samples_period() methodArnaldo Carvalho de Melo
To print a string using the total period (nr_events) and the number of samples for a given annotation, i.e. for a given symbol, the counterpart to hists__scnprintf_samples_period(), that is for all the samples in a session (be it a live session, think 'perf top' or a perf.data file, think 'perf report'). Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Martin Liška <mliska@suse.cz> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196935 Link: https://lkml.kernel.org/n/tip-goj2wu4fxutc8vd46mw3yg14@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-23perf annotate: Use absolute addresses to calculate jump target offsetsArnaldo Carvalho de Melo
These types of jumps were confusing the annotate browser: entry_SYSCALL_64 /lib/modules/4.16.0-rc5-00086-gdf09348f78dc/build/vmlinux entry_SYSCALL_64 /lib/modules/4.16.0-rc5-00086-gdf09348f78dc/build/vmlinux Percent│ffffffff81a00020: swapgs <SNIP> │ffffffff81a00128: ↓ jae ffffffff81a00139 <syscall_return_via_sysret+0x53> <SNIP> │ffffffff81a00155: → jmpq *0x825d2d(%rip) # ffffffff82225e88 <pv_cpu_ops+0xe8> I.e. the syscall_return_via_sysret function is actually "inside" the entry_SYSCALL_64 function, and the offsets in jumps like these (+0x53) are relative to syscall_return_via_sysret, not to syscall_return_via_sysret. Or this may be some artifact in how the assembler marks the start and end of a function and how this ends up in the ELF symtab for vmlinux, i.e. syscall_return_via_sysret() isn't "inside" entry_SYSCALL_64, but just right after it. From readelf -sw vmlinux: 80267: ffffffff81a00020 315 NOTYPE GLOBAL DEFAULT 1 entry_SYSCALL_64 316: ffffffff81a000e6 0 NOTYPE LOCAL DEFAULT 1 syscall_return_via_sysret 0xffffffff81a00020 + 315 > 0xffffffff81a000e6 So instead of looking for offsets after that last '+' sign, calculate offsets for jump target addresses that are inside the function being disassembled from the absolute address, 0xffffffff81a00139 in this case, subtracting from it the objdump address for the start of the function being disassembled, entry_SYSCALL_64() in this case. So, before this patch: entry_SYSCALL_64 /lib/modules/4.16.0-rc5-00086-gdf09348f78dc/build/vmlinux Percent│ pop %r10 │ pop %r9 │ pop %r8 │ pop %rax │ pop %rsi │ pop %rdx │ pop %rsi │ mov %rsp,%rdi │ mov %gs:0x5004,%rsp │ pushq 0x28(%rdi) │ pushq (%rdi) │ push %rax │ ↑ jmp 6c │ mov %cr3,%rdi │ ↑ jmp 62 │ mov %rdi,%rax │ and $0x7ff,%rdi │ bt %rdi,%gs:0x2219a │ ↑ jae 53 │ btr %rdi,%gs:0x2219a │ mov %rax,%rdi │ ↑ jmp 5b After: entry_SYSCALL_64 /lib/modules/4.16.0-rc5-00086-gdf09348f78dc/build/vmlinux 0.65 │ → jne swapgs_restore_regs_and_return_to_usermode │ pop %r10 │ pop %r9 │ pop %r8 │ pop %rax │ pop %rsi │ pop %rdx │ pop %rsi │ mov %rsp,%rdi │ mov %gs:0x5004,%rsp │ pushq 0x28(%rdi) │ pushq (%rdi) │ push %rax │ ↓ jmp 132 │ mov %cr3,%rdi │ ┌──jmp 128 │ │ mov %rdi,%rax │ │ and $0x7ff,%rdi │ │ bt %rdi,%gs:0x2219a │ │↓ jae 119 │ │ btr %rdi,%gs:0x2219a │ │ mov %rax,%rdi │ │↓ jmp 121 │119:│ mov %rax,%rdi │ │ bts $0x3f,%rdi │121:│ or $0x800,%rdi │128:└─→or $0x1000,%rdi │ mov %rdi,%cr3 │132: pop %rax │ pop %rdi │ pop %rsp │ → jmpq *0x825d2d(%rip) # ffffffff82225e88 <pv_cpu_ops+0xe8> With those at least navigating to the right destination, an improvement for these cases seems to be to be to somehow mark those inner functions, which in this case could be: entry_SYSCALL_64 /lib/modules/4.16.0-rc5-00086-gdf09348f78dc/build/vmlinux │syscall_return_via_sysret: │ pop %r15 │ pop %r14 │ pop %r13 │ pop %r12 │ pop %rbp │ pop %rbx │ pop %rsi │ pop %r10 │ pop %r9 │ pop %r8 │ pop %rax │ pop %rsi │ pop %rdx │ pop %rsi │ mov %rsp,%rdi │ mov %gs:0x5004,%rsp │ pushq 0x28(%rdi) │ pushq (%rdi) │ push %rax │ ↓ jmp 132 │ mov %cr3,%rdi │ ┌──jmp 128 │ │ mov %rdi,%rax │ │ and $0x7ff,%rdi │ │ bt %rdi,%gs:0x2219a │ │↓ jae 119 │ │ btr %rdi,%gs:0x2219a │ │ mov %rax,%rdi │ │↓ jmp 121 │119:│ mov %rax,%rdi │ │ bts $0x3f,%rdi │121:│ or $0x800,%rdi │128:└─→or $0x1000,%rdi │ mov %rdi,%cr3 │132: pop %rax │ pop %rdi │ pop %rsp │ → jmpq *0x825d2d(%rip) # ffffffff82225e88 <pv_cpu_ops+0xe8> This all gets much better viewed if one uses 'perf report --ignore-vmlinux' forcing the usage of /proc/kcore + /proc/kallsyms, when the above actually gets down to: # perf report --ignore-vmlinux ## do '/64', will show the function names containing '64', ## navigate to /entry_SYSCALL_64_after_hwframe.annotation, ## press 'A' to annotate, then 'P' to print that annotation ## to a file ## From another xterm (or see on screen, this 'P' thing is for ## getting rid of those right side scroll bars/spaces): # cat /entry_SYSCALL_64_after_hwframe.annotation entry_SYSCALL_64_after_hwframe() /proc/kcore Event: cycles:ppp Percent Disassembly of section load0: ffffffff9aa00044 <load0>: 11.97 push %rax 4.85 push %rdi push %rsi 2.59 push %rdx 2.27 push %rcx 0.32 pushq $0xffffffffffffffda 1.29 push %r8 xor %r8d,%r8d 1.62 push %r9 0.65 xor %r9d,%r9d 1.62 push %r10 xor %r10d,%r10d 5.50 push %r11 xor %r11d,%r11d 3.56 push %rbx xor %ebx,%ebx 4.21 push %rbp xor %ebp,%ebp 2.59 push %r12 0.97 xor %r12d,%r12d 3.24 push %r13 xor %r13d,%r13d 2.27 push %r14 xor %r14d,%r14d 4.21 push %r15 xor %r15d,%r15d 0.97 mov %rsp,%rdi 5.50 → callq do_syscall_64 14.56 mov 0x58(%rsp),%rcx 7.44 mov 0x80(%rsp),%r11 0.32 cmp %rcx,%r11 → jne swapgs_restore_regs_and_return_to_usermode 0.32 shl $0x10,%rcx 0.32 sar $0x10,%rcx 3.24 cmp %rcx,%r11 → jne swapgs_restore_regs_and_return_to_usermode 2.27 cmpq $0x33,0x88(%rsp) 1.29 → jne swapgs_restore_regs_and_return_to_usermode mov 0x30(%rsp),%r11 8.74 cmp %r11,0x90(%rsp) → jne swapgs_restore_regs_and_return_to_usermode 0.32 test $0x10100,%r11 → jne swapgs_restore_regs_and_return_to_usermode 0.32 cmpq $0x2b,0xa0(%rsp) 0.65 → jne swapgs_restore_regs_and_return_to_usermode I.e. using kallsyms makes the function start/end be done differently than using what is in the vmlinux ELF symtab and actually the hits goes to entry_SYSCALL_64_after_hwframe, which is a GLOBAL() after the start of entry_SYSCALL_64: ENTRY(entry_SYSCALL_64) UNWIND_HINT_EMPTY <SNIP> pushq $__USER_CS /* pt_regs->cs */ pushq %rcx /* pt_regs->ip */ GLOBAL(entry_SYSCALL_64_after_hwframe) pushq %rax /* pt_regs->orig_ax */ PUSH_AND_CLEAR_REGS rax=$-ENOSYS And it goes and ends at: cmpq $__USER_DS, SS(%rsp) /* SS must match SYSRET */ jne swapgs_restore_regs_and_return_to_usermode /* * We win! This label is here just for ease of understanding * perf profiles. Nothing jumps here. */ syscall_return_via_sysret: /* rcx and r11 are already restored (see code above) */ UNWIND_HINT_EMPTY POP_REGS pop_rdi=0 skip_r11rcx=1 So perhaps some people should really just play with '--ignore-vmlinux' to force /proc/kcore + kallsyms. One idea is to do both, i.e. have a vmlinux annotation and a kcore+kallsyms one, when possible, and even show the patched location, etc. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-r11knxv8voesav31xokjiuo6@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-23perf annotate: Defer searching for comma in raw line till it is neededArnaldo Carvalho de Melo
That strchr() in jump__scnprintf() needs to be nuked somehow, as it, IIRC is already done in jump__parse() and if needed at scnprintf() time, should be stashed in the struct filled in parse() time. For now jus defer it to just before where it is used. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-j0t5hagnphoz9xw07bh3ha3g@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-23perf annotate: Support jumping from one function to anotherArnaldo Carvalho de Melo
For instance: entry_SYSCALL_64 /lib/modules/4.16.0-rc5-00086-gdf09348f78dc/build/vmlinux 5.50 │ → callq do_syscall_64 14.56 │ mov 0x58(%rsp),%rcx 7.44 │ mov 0x80(%rsp),%r11 0.32 │ cmp %rcx,%r11 │ → jne swapgs_restore_regs_and_return_to_usermode 0.32 │ shl $0x10,%rcx 0.32 │ sar $0x10,%rcx 3.24 │ cmp %rcx,%r11 │ → jne swapgs_restore_regs_and_return_to_usermode 2.27 │ cmpq $0x33,0x88(%rsp) 1.29 │ → jne swapgs_restore_regs_and_return_to_usermode │ mov 0x30(%rsp),%r11 8.74 │ cmp %r11,0x90(%rsp) │ → jne swapgs_restore_regs_and_return_to_usermode 0.32 │ test $0x10100,%r11 │ → jne swapgs_restore_regs_and_return_to_usermode 0.32 │ cmpq $0x2b,0xa0(%rsp) 0.65 │ → jne swapgs_restore_regs_and_return_to_usermode It'll behave just like a "call" instruction, i.e. press enter or right arrow over one such line and the browser will navigate to the annotated disassembly of that function, which when exited, via left arrow or esc, will come back to the calling function. Now to support jump to an offset on a different function... Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-78o508mqvr8inhj63ddtw7mo@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-23perf annotate: Add "_local" to jump/offset validation routinesArnaldo Carvalho de Melo
Because they all really check if we can access data structures/visual constructs where a "jump" instruction targets code in the same function, i.e. things like: __pthread_mutex_lock /usr/lib64/libpthread-2.26.so 1.95 │ mov __pthread_force_elision,%ecx │ ┌──test %ecx,%ecx 0.07 │ ├──je 60 │ │ test $0x300,%esi │ │↓ jne 60 │ │ or $0x100,%esi │ │ mov %esi,0x10(%rdi) │ 42:│ mov %esi,%edx │ │ lea 0x16(%r8),%rsi │ │ mov %r8,%rdi │ │ and $0x80,%edx │ │ add $0x8,%rsp │ │→ jmpq __lll_lock_elision │ │ nop 0.29 │ 60:└─→and $0x80,%esi 0.07 │ mov $0x1,%edi 0.29 │ xor %eax,%eax 2.53 │ lock cmpxchg %edi,(%r8) And not things like that "jmpq __lll_lock_elision", that instead should behave like a "call" instruction and "jump" to the disassembly of "___lll_lock_elision". Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-3cwx39u3h66dfw9xjrlt7ca2@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-21perf annotate: Mark jumps to outher functions with the call arrowArnaldo Carvalho de Melo
Things like this in _cpp_lex_token (gcc's cc1 program): cpp_named_operator2name@@Base+0xa72 Point to a place that is after the cpp_named_operator2name boundaries, i.e. in the ELF symbol table for cc1 cpp_named_operator2name is marked as being 32-bytes long, but it in fact is much larger than that, so we seem to need a symbols__find() routine that looks for >= current->start and < next_symbol->start, possibly just for C++ objects? For now lets just make some progress by marking jumps to outside the current function as call like. Actual navigation will come next, with further understanding of how the symbol searching and disassembly should be done. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-aiys0a0bsgm3e00hbi6fg7yy@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-21perf annotate: Pass function descriptor to its instruction parsing routinesArnaldo Carvalho de Melo
We need that to figure out if jumps have targets in a different function. E.g. _cpp_lex_token(), in /usr/libexec/gcc/x86_64-redhat-linux/5.3.1/cc1 has a line like this: jne c469be <cpp_named_operator2name@@Base+0xa72> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-ris0ioziyp469pofpzix2atb@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-21perf annotate: No need to calculate notes->start twiceArnaldo Carvalho de Melo
Since we already set notes->start to map__rip_2objdump(map, sym->start) in symbol__annotate2(), no need to calculate that address again in symbol__calc_lines(), just use notes->start. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-ycxlg8mm5ueuj21w6gi62l7g@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-21perf annotate browser: Add 'P' hotkey to dump annotation to fileArnaldo Carvalho de Melo
Just like we have in the histograms browser used as the main screen for 'perf top --tui' and 'perf report --tui', to print the current annotation to a file with a named composed by the symbol name and the ".annotation" suffix. Here is one example of pressing 'A' on 'perf top' to live annotate a kernel function and then press 'P' to dump that annotation, the resulting file: # cat _raw_spin_lock_irqsave.annotation _raw_spin_lock_irqsave() /proc/kcore Event: cycles:ppp 7.14 nop 21.43 push %rbx 7.14 pushfq pop %rax nop mov %rax,%rbx cli nop xor %eax,%eax mov $0x1,%edx 64.29 lock cmpxchg %edx,(%rdi) test %eax,%eax ↓ jne 2b mov %rbx,%rax pop %rbx ← retq 2b: mov %eax,%esi → callq queued_spin_lock_slowpath mov %rbx,%rax pop %rbx ← retq # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-zzmnrwugb5vtk7bvg0rbx150@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-21perf annotate: Add function header to --stdio2Arnaldo Carvalho de Melo
# perf annotate --stdio2 _raw_spin_lock_irqsave _raw_spin_lock_irqsave() /lib/modules/4.16.0-rc4/build/vmlinux Event: anon group { cycles, instructions } 0.00 3.17 → callq __fentry__ 0.00 7.94 push %rbx 7.69 36.51 → callq __page_file_index mov %rax,%rbx 7.69 3.17 → callq *ffffffff82225cd0 xor %eax,%eax mov $0x1,%edx 80.77 49.21 lock cmpxchg %edx,(%rdi) test %eax,%eax ↓ jne 2b 3.85 0.00 mov %rbx,%rax pop %rbx ← retq 2b: mov %eax,%esi → callq queued_spin_lock_slowpath mov %rbx,%rax pop %rbx ← retq # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-i86yfyzl8m194ioxgj1jo32f@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-21perf annotate: Use the default annotation options for --stdio2Arnaldo Carvalho de Melo
With an empty '[annotate]' section in ~/.perfconfig: # perf record -a --all-kernel -e '{cycles,instructions}:P' sleep 5 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.243 MB perf.data (5513 samples) ] # perf annotate --stdio2 _raw_spin_lock | head -20 Disassembly of section .text: ffffffff81868790 <_raw_spin_lock>: _raw_spin_lock(): EXPORT_SYMBOL(_raw_spin_trylock_bh); #endif #ifndef CONFIG_INLINE_SPIN_LOCK void __lockfunc _raw_spin_lock(raw_spinlock_t *lock) { → callq __fentry__ atomic_cmpxchg(): return xadd(&v->counter, -i); } static __always_inline int atomic_cmpxchg(atomic_t *v, int old, int new) { # perf annotate --stdio2 _raw_spin_lock | head -20 → callq __fentry__ xor %eax,%eax mov $0x1,%edx 87.50 100.00 lock cmpxchg %edx,(%rdi) 6.25 0.00 test %eax,%eax ↓ jne 16 6.25 0.00 repz retq 16: mov %eax,%esi ↑ jmpq ffffffff810e96b0 <queued_spin_lock_slowpath> # # cat ~/.perfconfig [annotate] hide_src_code = false show_linenr = true # perf annotate --stdio2 _raw_spin_lock | head -20 3 Disassembly of section .text: 5 ffffffff81868790 <_raw_spin_lock>: 6 _raw_spin_lock(): 143 EXPORT_SYMBOL(_raw_spin_trylock_bh); 144 #endif 146 #ifndef CONFIG_INLINE_SPIN_LOCK 147 void __lockfunc _raw_spin_lock(raw_spinlock_t *lock) 148 { → callq __fentry__ 150 atomic_cmpxchg(): 187 return xadd(&v->counter, -i); 188 } 190 static __always_inline int atomic_cmpxchg(atomic_t *v, int old, int new) 191 { # # cat ~/.perfconfig [annotate] hide_src_code = true show_total_period = true # perf annotate --stdio2 _raw_spin_lock | head -20 → callq __fentry__ xor %eax,%eax mov $0x1,%edx 1411316 152339 lock cmpxchg %edx,(%rdi) 344694 0 test %eax,%eax ↓ jne 16 80806 0 repz retq 16: mov %eax,%esi ↑ jmpq ffffffff810e96b0 <queued_spin_lock_slowpath> # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-nu4rxg5zkdtgs1b2gc40p7v7@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-21perf annotate: Move the default annotate options to the libraryArnaldo Carvalho de Melo
One more thing that goes from the TUI code to be used more widely, for instance it'll affect the default options used by: perf annotate --stdio2 Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-0nsz0dm0akdbo30vgja2a10e@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-21perf annotate: Introduce the --stdio2 output modeArnaldo Carvalho de Melo
This uses the TUI augmented formatting routines, modulo interactivity. # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave _raw_spin_lock_irqsave() /proc/kcore Event: cycles:ppp Percent Disassembly of section load0: ffffffff9a8734b0 <load0>: nop push %rbx 50.00 pushfq pop %rax nop mov %rax,%rbx cli nop xor %eax,%eax mov $0x1,%edx 50.00 lock cmpxchg %edx,(%rdi) test %eax,%eax ↓ jne 2b mov %rbx,%rax pop %rbx ← retq 2b: mov %eax,%esi → callq queued_spin_lock_slowpath mov %rbx,%rax pop %rbx ← retq Tested-by: Jin Yao <yao.jin@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-6cte5o8z84mbivbvqlg14uh1@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-20perf annotate: Use a ops table for annotation_line__write()Arnaldo Carvalho de Melo
To simplify the passing of arguments, the --stdio2 code will have to set all the fields with operations printing to stdout. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-pcs3c7vdy9ucygxflo4nl1o7@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-20perf annotate: Finish the generalization of annotate_browser__write()Arnaldo Carvalho de Melo
We pass some more callbacks and all of annotate_browser__write() seems to be free of TUI code (except for some arrow constants, will fix). Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-5uo6yvwnxtsbe8y6v0ysaakf@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-20perf annotate: Introduce annotation_line__print_start() out of TUI codeArnaldo Carvalho de Melo
For the --tui and --stdio2 cases using callbacks for print() and set_percent_color() end up being the easiest path, real GUI remains as an exercise. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-1o7az1ng55g2g6ppr2jpeuct@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-20perf annotate: Introduce annotation_line__max_percent()Arnaldo Carvalho de Melo
Out of the annotate_browser__write() routine, to be used in the --stdio2 mode. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-0he0wyy4haswqi1qb35x37do@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-20perf annotate: Introduce symbol__annotate2 methodArnaldo Carvalho de Melo
That does all the extended boilerplate the TUI browser did, leaving the symbol__annotate() function to be used by the old --stdio output mode. Now the upcoming --stdio2 output mode should just use this one to set things up. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-e2x8wuf6gvdhzdryo229vj4i@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-20perf annotate: Introduce init_column_widths() method out of TUI codeArnaldo Carvalho de Melo
More non-TUI stuff goes to the UI-agnostic library Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-hngv7rpqvtta69ouj7ne770q@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-20perf annotate: Move update_column_widths() to the generic libArnaldo Carvalho de Melo
Previous patch left it where it was to ease review, move it to its right place. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-ikdjr014p7k5kachgyjrgiey@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>