summaryrefslogtreecommitdiffstats
path: root/kernel/trace/trace.c
AgeCommit message (Collapse)Author
2017-07-27tracing: Fix kmemleak in instance_rmdirChunyu Hu
commit db9108e054700c96322b0f0028546aa4e643cf0b upstream. Hit the kmemleak when executing instance_rmdir, it forgot releasing mem of tracing_cpumask. With this fix, the warn does not appear any more. unreferenced object 0xffff93a8dfaa7c18 (size 8): comm "mkdir", pid 1436, jiffies 4294763622 (age 9134.308s) hex dump (first 8 bytes): ff ff ff ff ff ff ff ff ........ backtrace: [<ffffffff88b6567a>] kmemleak_alloc+0x4a/0xa0 [<ffffffff8861ea41>] __kmalloc_node+0xf1/0x280 [<ffffffff88b505d3>] alloc_cpumask_var_node+0x23/0x30 [<ffffffff88b5060e>] alloc_cpumask_var+0xe/0x10 [<ffffffff88571ab0>] instance_mkdir+0x90/0x240 [<ffffffff886e5100>] tracefs_syscall_mkdir+0x40/0x70 [<ffffffff886565c9>] vfs_mkdir+0x109/0x1b0 [<ffffffff8865b1d0>] SyS_mkdir+0xd0/0x100 [<ffffffff88403857>] do_syscall_64+0x67/0x150 [<ffffffff88b710e7>] return_from_SYSCALL_64+0x0/0x6a [<ffffffffffffffff>] 0xffffffffffffffff Link: http://lkml.kernel.org/r/1500546969-12594-1-git-send-email-chuhu@redhat.com Fixes: ccfe9e42e451 ("tracing: Make tracing_cpumask available for all instances") Signed-off-by: Chunyu Hu <chuhu@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-21tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate resultsPavankumar Kondeti
commit c59f29cb144a6a0dfac16ede9dc8eafc02dc56ca upstream. The 's' flag is supposed to indicate that a softirq is running. This can be detected by testing the preempt_count with SOFTIRQ_OFFSET. The current code tests the preempt_count with SOFTIRQ_MASK, which would be true even when softirqs are disabled but not serving a softirq. Link: http://lkml.kernel.org/r/1481300417-3564-1-git-send-email-pkondeti@codeaurora.org Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Amit Pundir <amit.pundir@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-27tracing: Allocate the snapshot buffer before enabling probeSteven Rostedt (VMware)
commit df62db5be2e5f070ecd1a5ece5945b590ee112e0 upstream. Currently the snapshot trigger enables the probe and then allocates the snapshot. If the probe triggers before the allocation, it could cause the snapshot to fail and turn tracing off. It's best to allocate the snapshot buffer first, and then enable the trigger. If something goes wrong in the enabling of the trigger, the snapshot buffer is still allocated, but it can also be freed by the user by writting zero into the snapshot buffer file. Also add a check of the return status of alloc_snapshot(). Fixes: 77fd5c15e3 ("tracing: Add snapshot trigger to function probes") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-21ftrace: Fix function pid filter on instancesNamhyung Kim
commit d879d0b8c183aabeb9a65eba91f3f9e3c7e7b905 upstream. When function tracer has a pid filter, it adds a probe to sched_switch to track if current task can be ignored. The probe checks the ftrace_ignore_pid from current tr to filter tasks. But it misses to delete the probe when removing an instance so that it can cause a crash due to the invalid tr pointer (use-after-free). This is easily reproducible with the following: # cd /sys/kernel/debug/tracing # mkdir instances/buggy # echo $$ > instances/buggy/set_ftrace_pid # rmdir instances/buggy ============================================================================ BUG: KASAN: use-after-free in ftrace_filter_pid_sched_switch_probe+0x3d/0x90 Read of size 8 by task kworker/0:1/17 CPU: 0 PID: 17 Comm: kworker/0:1 Tainted: G B 4.11.0-rc3 #198 Call Trace: dump_stack+0x68/0x9f kasan_object_err+0x21/0x70 kasan_report.part.1+0x22b/0x500 ? ftrace_filter_pid_sched_switch_probe+0x3d/0x90 kasan_report+0x25/0x30 __asan_load8+0x5e/0x70 ftrace_filter_pid_sched_switch_probe+0x3d/0x90 ? fpid_start+0x130/0x130 __schedule+0x571/0xce0 ... To fix it, use ftrace_clear_pids() to unregister the probe. As instance_rmdir() already updated ftrace codes, it can just free the filter safely. Link: http://lkml.kernel.org/r/20170417024430.21194-2-namhyung@kernel.org Fixes: 0c8916c34203 ("tracing: Add rmdir to remove multibuffer instances") Cc: Ingo Molnar <mingo@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15fs: Better permission checking for submountsEric W. Biederman
commit 93faccbbfa958a9668d3ab4e30f38dd205cee8d8 upstream. To support unprivileged users mounting filesystems two permission checks have to be performed: a test to see if the user allowed to create a mount in the mount namespace, and a test to see if the user is allowed to access the specified filesystem. The automount case is special in that mounting the original filesystem grants permission to mount the sub-filesystems, to any user who happens to stumble across the their mountpoint and satisfies the ordinary filesystem permission checks. Attempting to handle the automount case by using override_creds almost works. It preserves the idea that permission to mount the original filesystem is permission to mount the sub-filesystem. Unfortunately using override_creds messes up the filesystems ordinary permission checks. Solve this by being explicit that a mount is a submount by introducing vfs_submount, and using it where appropriate. vfs_submount uses a new mount internal mount flags MS_SUBMOUNT, to let sget and friends know that a mount is a submount so they can take appropriate action. sget and sget_userns are modified to not perform any permission checks on submounts. follow_automount is modified to stop using override_creds as that has proven problemantic. do_mount is modified to always remove the new MS_SUBMOUNT flag so that we know userspace will never by able to specify it. autofs4 is modified to stop using current_real_cred that was put in there to handle the previous version of submount permission checking. cifs is modified to pass the mountpoint all of the way down to vfs_submount. debugfs is modified to pass the mountpoint all of the way down to trace_automount by adding a new parameter. To make this change easier a new typedef debugfs_automount_t is introduced to capture the type of the debugfs automount function. Fixes: 069d5ac9ae0d ("autofs: Fix automounts by using current_real_cred()->uid") Fixes: aeaa4a79ff6a ("fs: Call d_automount with the filesystems creds") Reviewed-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-06Merge tag 'trace-v4.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "This release cycle is rather small. Just a few fixes to tracing. The big change is the addition of the hwlat tracer. It not only detects SMIs, but also other latency that's caused by the hardware. I have detected some latency from large boxes having bus contention" * tag 'trace-v4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Call traceoff trigger after event is recorded ftrace/scripts: Add helper script to bisect function tracing problem functions tracing: Have max_latency be defined for HWLAT_TRACER as well tracing: Add NMI tracing in hwlat detector tracing: Have hwlat trace migrate across tracing_cpumask CPUs tracing: Add documentation for hwlat_detector tracer tracing: Added hardware latency tracer ftrace: Access ret_stack->subtime only in the function profiler function_graph: Handle TRACE_BPUTS in print_graph_comment tracing/uprobe: Drop isdigit() check in create_trace_uprobe
2016-10-03Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "The main kernel side changes were: - uprobes enhancements (Masami Hiramatsu) - Uncore group events enhancements (David Carrillo-Cisneros) - x86 Intel: Add support for Skylake server uncore PMUs (Kan Liang) - x86 Intel: LBR cleanups and enhancements, for better branch annotation tracking (Peter Zijlstra) - x86 Intel: Add support for PTWRITE and power event tracing (Alexander Shishkin) - ... various fixes, cleanups and smaller enhancements. Lots of tooling changes - a couple of highlights: - Support event group view with hierarchy mode in 'perf top' and 'perf report' (Namhyung Kim) e.g.: $ perf record -e '{cycles,instructions}' make $ perf report --hierarchy --stdio ... # Overhead Command / Shared Object / Symbol # ...................... .................................. ... 25.74% 27.18%sh 19.96% 24.14%libc-2.24.so 9.55% 14.64%[.] __strcmp_sse2 1.54% 0.00%[.] __tfind 1.07% 1.13%[.] _int_malloc 0.95% 0.00%[.] __strchr_sse2 0.89% 1.39%[.] __tsearch 0.76% 0.00%[.] strlen - Add branch stack / basic block info to 'perf annotate --stdio', where for each branch, we add an asm comment after the instruction with information on how often it was taken and predicted. See example with color output at: http://vger.kernel.org/~acme/perf/annotate_basic_blocks.png (Peter Zijlstra) - Add support for using symbols in address filters with Intel PT and ARM CoreSight (hardware assisted tracing facilities) (Adrian Hunter, Mathieu Poirier) - Add support for interacting with Coresight PMU ETMs/PTMs, that are IP blocks to perform hardware assisted tracing on a ARM CPU core (Mathieu Poirier) - Support generating cross arch probes, i.e. if you specify a vmlinux file for different arch than the one in the host machine, $ perf probe --definition function_name args will generate the probe definition string needed to append to the target machine /sys/kernel/debug/tracing/kprobes_events file, using scripting (Masami Hiramatsu). - Allow configuring the default 'perf report -s' sort order in ~/.perfconfig, for instance, "sym,dso" may be more fitting for kernel developers. (Arnaldo Carvalho de Melo) - ... plus lots of other changes, refactorings, features and fixes" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (149 commits) perf tests: Add dwarf unwind test for powerpc perf probe: Match linkage name with mangled name perf probe: Fix to cut off incompatible chars from group name perf probe: Skip if the function address is 0 perf probe: Ignore the error of finding inline instance perf intel-pt: Fix decoding when there are address filters perf intel-pt: Enable decoder to handle TIP.PGD with missing IP perf intel-pt: Read address filter from AUXTRACE_INFO event perf intel-pt: Record address filter in AUXTRACE_INFO event perf intel-pt: Add a helper function for processing AUXTRACE_INFO perf intel-pt: Fix missing error codes processing auxtrace_info perf intel-pt: Add support for recording the max non-turbo ratio perf intel-pt: Fix snapshot overlap detection decoder errors perf probe: Increase debug level of SDT debug messages perf record: Add support for using symbols in address filters perf symbols: Add dso__last_symbol() perf record: Fix error paths perf record: Rename label 'out_symbol_exit' perf script: Fix vanished idle symbols perf evsel: Add support for address filters ...
2016-09-25Merge tag 'trace-v4.8-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracefs fixes from Steven Rostedt: "Al Viro has been looking at the tracefs code, and has pointed out some issues. This contains one fix by me and one by Al. I'm sure that he'll come up with more but for now I tested these patches and they don't appear to have any negative impact on tracing" * tag 'trace-v4.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: fix memory leaks in tracing_buffers_splice_read() tracing: Move mutex to protect against resetting of seq data
2016-09-25fix memory leaks in tracing_buffers_splice_read()Al Viro
Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-25tracing: Move mutex to protect against resetting of seq dataSteven Rostedt (Red Hat)
The iter->seq can be reset outside the protection of the mutex. So can reading of user data. Move the mutex up to the beginning of the function. Fixes: d7350c3f45694 ("tracing/core: make the read callbacks reentrants") Cc: stable@vger.kernel.org # 2.6.30+ Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-09-12tracing: Have max_latency be defined for HWLAT_TRACER as wellSteven Rostedt (Red Hat)
The hwlat tracer uses tr->max_latency, and if it's the only tracer enabled that uses it, the build will fail. Add max_latency and its file when the hwlat tracer is enabled. Link: http://lkml.kernel.org/r/d6c3b7eb-ba95-1ffa-0453-464e1e24262a@infradead.org Reported-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-09-02tracing: Added hardware latency tracerSteven Rostedt (Red Hat)
The hardware latency tracer has been in the PREEMPT_RT patch for some time. It is used to detect possible SMIs or any other hardware interruptions that the kernel is unaware of. Note, NMIs may also be detected, but that may be good to note as well. The logic is pretty simple. It simply creates a thread that spins on a single CPU for a specified amount of time (width) within a periodic window (window). These numbers may be adjusted by their cooresponding names in /sys/kernel/tracing/hwlat_detector/ The defaults are window = 1000000 us (1 second) width = 500000 us (1/2 second) The loop consists of: t1 = trace_clock_local(); t2 = trace_clock_local(); Where trace_clock_local() is a variant of sched_clock(). The difference of t2 - t1 is recorded as the "inner" timestamp and also the timestamp t1 - prev_t2 is recorded as the "outer" timestamp. If either of these differences are greater than the time denoted in /sys/kernel/tracing/tracing_thresh then it records the event. When this tracer is started, and tracing_thresh is zero, it changes to the default threshold of 10 us. The hwlat tracer in the PREEMPT_RT patch was originally written by Jon Masters. I have modified it quite a bit and turned it into a tracer. Based-on-code-by: Jon Masters <jcm@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-08-23ftrace: probe: Add README entries for k/uprobe-eventsMasami Hiramatsu
Add README entries for kprobe-events and uprobe-events. This allows user to check what options can be acceptable for running kernel. E.g. perf tools can choose correct types for the kernel. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Cc: Naohiro Aota <naohiro.aota@hgst.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/147151069524.12957.12957179170304055028.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-05tracing: Using for_each_set_bit() to simplify trace_pid_write()Wei Yongjun
Using for_each_set_bit() to simplify the code. Link: http://lkml.kernel.org/r/1467645004-11169-1-git-send-email-weiyj_lk@163.com Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-07-05ftrace: Move toplevel init out of ftrace_init_tracefs()Steven Rostedt (Red Hat)
Commit 345ddcc882d8 ("ftrace: Have set_ftrace_pid use the bitmap like events do") placed ftrace_init_tracefs into the instance creation, and encapsulated the top level updating with an if conditional, as the top level only gets updated at boot up. Unfortunately, this triggers section mismatch errors as the init functions are called from a function that can be called later, and the section mismatch logic is unaware of the if conditional that would prevent it from happening at run time. To make everyone happy, create a separate ftrace_init_tracefs_toplevel() routine that only gets called by init functions, and this will be what calls other init functions for the toplevel directory. Link: http://lkml.kernel.org/r/20160704102139.19cbc0d9@gandalf.local.home Reported-by: kbuild test robot <fengguang.wu@intel.com> Reported-by: Arnd Bergmann <arnd@arndb.de> Fixes: 345ddcc882d8 ("ftrace: Have set_ftrace_pid use the bitmap like events do") Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-06-23tracing: Skip more functions when doing stack tracing of eventsSteven Rostedt (Red Hat)
# echo 1 > options/stacktrace # echo 1 > events/sched/sched_switch/enable # cat trace <idle>-0 [002] d..2 1982.525169: <stack trace> => save_stack_trace => __ftrace_trace_stack => trace_buffer_unlock_commit_regs => event_trigger_unlock_commit => trace_event_buffer_commit => trace_event_raw_event_sched_switch => __schedule => schedule => schedule_preempt_disabled => cpu_startup_entry => start_secondary The above shows that we are seeing 6 functions before ever making it to the caller of the sched_switch event. # echo stacktrace > events/sched/sched_switch/trigger # cat trace <idle>-0 [002] d..3 2146.335208: <stack trace> => trace_event_buffer_commit => trace_event_raw_event_sched_switch => __schedule => schedule => schedule_preempt_disabled => cpu_startup_entry => start_secondary The stacktrace trigger isn't as bad, because it adds its own skip to the stacktracing, but still has two events extra. One issue is that if the stacktrace passes its own "regs" then there should be no addition to the skip, as the regs will not include the functions being called. This was an issue that was fixed by commit 7717c6be6999 ("tracing: Fix stacktrace skip depth in trace_buffer_unlock_commit_regs()" as adding the skip number for kprobes made the probes not have any stack at all. But since this is only an issue when regs is being used, a skip should be added if regs is NULL. Now we have: # echo 1 > options/stacktrace # echo 1 > events/sched/sched_switch/enable # cat trace <idle>-0 [000] d..2 1297.676333: <stack trace> => __schedule => schedule => schedule_preempt_disabled => cpu_startup_entry => rest_init => start_kernel => x86_64_start_reservations => x86_64_start_kernel # echo stacktrace > events/sched/sched_switch/trigger # cat trace <idle>-0 [002] d..3 1370.759745: <stack trace> => __schedule => schedule => schedule_preempt_disabled => cpu_startup_entry => start_secondary And kprobes are not touched. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-06-20tracing: Choose static tp_printk buffer by explicit nesting countAndy Lutomirski
Currently, the trace_printk code chooses which static buffer to use based on what type of atomic context (NMI, IRQ, etc) it's in. Simplify the code and make it more robust: simply count the nesting depth and choose a buffer based on the current nesting depth. The new code will only drop an event if we nest more than 4 deep, and the old code was guaranteed to malfunction if that happened. Link: http://lkml.kernel.org/r/07ab03aecfba25fcce8f9a211b14c9c5e2865c58.1464289095.git.luto@kernel.org Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-06-20ftrace: Have set_ftrace_pid use the bitmap like events doSteven Rostedt (Red Hat)
Convert set_ftrace_pid to use the bitmap like set_event_pid does. This allows for instances to use the pid filtering as well, and will allow for function-fork option to set if the children of a traced function should be traced or not. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-06-20tracing: Move pid_list write processing into its own functionSteven Rostedt (Red Hat)
The addition of PIDs into a pid_list via the write operation of set_event_pid is a bit complex. The same operation will be needed for function tracing pids. Move the code into its own generic function in trace.c, so that we can avoid duplication of this code. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-06-20tracing: Move the pid_list seq_file functions to be globalSteven Rostedt (Red Hat)
To allow other aspects of ftrace to use the pid_list logic, we need to reuse the seq_file functions. Making the generic part into functions that can be called by other files will help in this regard. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-06-20tracing: Move filtered_pid helper functions into trace.cSteven Rostedt
As the filtered_pid functions are going to be used by function tracer as well as trace_events, move the code into the generic trace.c file. The functions moved are: trace_find_filtered_pid() trace_ignore_this_task() trace_filter_add_remove_task() Kernel Doc text was also added. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-05-03tracing: Use temp buffer when filtering eventsSteven Rostedt (Red Hat)
Filtering of events requires the data to be written to the ring buffer before it can be decided to filter or not. This is because the parameters of the filter are based on the result that is written to the ring buffer and not on the parameters that are passed into the trace functions. The ftrace ring buffer is optimized for writing into the ring buffer and committing. The discard procedure used when filtering decides the event should be discarded is much more heavy weight. Thus, using a temporary filter when filtering events can speed things up drastically. Without a temp buffer we have: # trace-cmd start -p nop # perf stat -r 10 hackbench 50 0.790706626 seconds time elapsed ( +- 0.71% ) # trace-cmd start -e all # perf stat -r 10 hackbench 50 1.566904059 seconds time elapsed ( +- 0.27% ) # trace-cmd start -e all -f 'common_preempt_count==20' # perf stat -r 10 hackbench 50 1.690598511 seconds time elapsed ( +- 0.19% ) # trace-cmd start -e all -f 'common_preempt_count!=20' # perf stat -r 10 hackbench 50 1.707486364 seconds time elapsed ( +- 0.30% ) The first run above is without any tracing, just to get a based figure. hackbench takes ~0.79 seconds to run on the system. The second run enables tracing all events where nothing is filtered. This increases the time by 100% and hackbench takes 1.57 seconds to run. The third run filters all events where the preempt count will equal "20" (this should never happen) thus all events are discarded. This takes 1.69 seconds to run. This is 10% slower than just committing the events! The last run enables all events and filters where the filter will commit all events, and this takes 1.70 seconds to run. The filtering overhead is approximately 10%. Thus, the discard and commit of an event from the ring buffer may be about the same time. With this patch, the numbers change: # trace-cmd start -p nop # perf stat -r 10 hackbench 50 0.778233033 seconds time elapsed ( +- 0.38% ) # trace-cmd start -e all # perf stat -r 10 hackbench 50 1.582102692 seconds time elapsed ( +- 0.28% ) # trace-cmd start -e all -f 'common_preempt_count==20' # perf stat -r 10 hackbench 50 1.309230710 seconds time elapsed ( +- 0.22% ) # trace-cmd start -e all -f 'common_preempt_count!=20' # perf stat -r 10 hackbench 50 1.786001924 seconds time elapsed ( +- 0.20% ) The first run is again the base with no tracing. The second run is all tracing with no filtering. It is a little slower, but that may be well within the noise. The third run shows that discarding all events only took 1.3 seconds. This is a speed up of 23%! The discard is much faster than even the commit. The one downside is shown in the last run. Events that are not discarded by the filter will take longer to add, this is due to the extra copy of the event. Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-29tracing: Remove unused function trace_current_buffer_lock_reserve()Steven Rostedt (Red Hat)
trace_current_buffer_lock_reserve() has no more users. Remove it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-29tracing: Have trace_buffer_unlock_commit() call the _regs version with NULLSteven Rostedt (Red Hat)
There's no real difference between trace_buffer_unlock_commit() and trace_buffer_unlock_commit_regs() except that the former passes NULL to ftrace_stack_trace() instead of regs. Have the former be a static inline of the latter which passes NULL for regs. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-29tracing: Remove unused function trace_current_buffer_discard_commit()Steven Rostedt (Red Hat)
The function trace_current_buffer_discard_commit() has no callers, remove it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-29tracing: Move trace_buffer_unlock_commit{_regs}() to local headerSteven Rostedt (Red Hat)
The functions trace_buffer_unlock_commit() and the _regs() version are only used within the kernel/trace directory. Move them to the local header and remove the export as well. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-29tracing: Fold filter_check_discard() into its only userSteven Rostedt (Red Hat)
The function filter_check_discard() is small and only called by one user, its code can be folded into that one caller and make the code a bit less comlplex. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-27tracing: Make filter_check_discard() localSteven Rostedt (Red Hat)
Nothing outside of the tracing directory calls filter_check_discard() or check_filter_check_discard(). They should not be called by modules. Move their prototypes into the local tracing header and remove their EXPORT_SYMBOL() macros. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-26tracing: Don't use the address of the buffer array name in copy_from_userWang Xiaoqiang
With the following code snippet: ... char buf[64]; ... if (copy_from_user(&buf, ubuf, cnt)) ... Even though the value of "&buf" equals "buf", but there is no need to get the address of the "buf" again. Use "buf" instead of "&buf". Link: http://lkml.kernel.org/r/20160418152329.18b72bea@debian Signed-off-by: Wang Xiaoqiang <wangxq10@lzu.edu.cn> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-25tracing: Do not inherit event-fork option for instancesSteven Rostedt (Red Hat)
As the event-fork option requires doing work when enabled and disabled, it can not be passed down to created instances. The instance must clear this flag when it is created, and must clear it when its removed. As more options may be created with this need, a macro ZEROED_TRACE_FLAGS is created that holds the flags that must not be inherited by the top level instance, and must be cleared on removal of instances. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add hist trigger 'log2' modifierNamhyung Kim
Allow users to have numeric fields displayed as log2 values in case value range is very wide by appending '.log2' to field names. For example, # echo 'hist:key=bytes_req' > kmalloc/trigger # cat kmalloc/hist { bytes_req: 504 } hitcount: 1 { bytes_req: 11 } hitcount: 1 { bytes_req: 104 } hitcount: 1 { bytes_req: 48 } hitcount: 1 { bytes_req: 2048 } hitcount: 1 { bytes_req: 4096 } hitcount: 1 { bytes_req: 240 } hitcount: 1 { bytes_req: 392 } hitcount: 1 { bytes_req: 13 } hitcount: 1 { bytes_req: 28 } hitcount: 1 { bytes_req: 12 } hitcount: 1 { bytes_req: 64 } hitcount: 2 { bytes_req: 128 } hitcount: 2 { bytes_req: 32 } hitcount: 2 { bytes_req: 8 } hitcount: 11 { bytes_req: 10 } hitcount: 13 { bytes_req: 24 } hitcount: 25 { bytes_req: 160 } hitcount: 29 { bytes_req: 16 } hitcount: 33 { bytes_req: 80 } hitcount: 36 When using '.log2' modifier, the output looks like: # echo 'hist:key=bytes_req.log2' > kmalloc/trigger # cat kmalloc/hist { bytes_req: ~ 2^12 } hitcount: 1 { bytes_req: ~ 2^11 } hitcount: 1 { bytes_req: ~ 2^9 } hitcount: 2 { bytes_req: ~ 2^6 } hitcount: 3 { bytes_req: ~ 2^3 } hitcount: 13 { bytes_req: ~ 2^5 } hitcount: 19 { bytes_req: ~ 2^8 } hitcount: 49 { bytes_req: ~ 2^7 } hitcount: 57 { bytes_req: ~ 2^4 } hitcount: 74 Link: http://lkml.kernel.org/r/7ff396b246c6a881f46b979735fddf05a0d6c71a.1457029949.git.tom.zanussi@linux.intel.com Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add support for named hist triggersTom Zanussi
Allow users to define 'named' hist triggers. All triggers created with the same 'name=xxx' option will update the same shared histogram data. This expands the hist trigger syntax from this: # echo hist:keys=xxx ... [ if filter] > event/trigger to this: # echo hist:name=xxx:keys=xxx ... [ if filter] > event/trigger Named histograms must use a 'compatible' set of keys and values, which means each event added to a set of named triggers must have the same names and types. Reading the 'hist' file of any of the participating events will produce the same output as any other participating event, which is to be expected since they share the same data. Link: http://lkml.kernel.org/r/1dbc84ee3322a75daaf5b3ef1d0cc0a2fb682fc7.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add support for multiple hist triggers per eventTom Zanussi
Allow users to define any number of hist triggers per trace event. Any number of hist triggers may be added for a given event, which may differ by key, value, or filter. Reading the event's 'hist' file will display the output of all the hist triggers defined on an event concatenated in the order they were defined. Link: http://lkml.kernel.org/r/48a0c8dd34c344571de880fb35e211c6d9a28961.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add enable_hist/disable_hist triggersTom Zanussi
Similar to enable_event/disable_event triggers, these triggers enable and disable the aggregation of events into maps rather than enabling and disabling their writing into the trace buffer. They can be used to automatically start and stop hist triggers based on a matching filter condition. If there's a paused hist trigger on system:event, the following would start it when the filter condition was hit: # echo enable_hist:system:event [ if filter] > event/trigger And the following would disable a running system:event hist trigger: # echo disable_hist:system:event [ if filter] > event/trigger See Documentation/trace/events.txt for real examples. Link: http://lkml.kernel.org/r/f812f086e52c8b7c8ad5443487375e03c96a601f.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add hist trigger support for stacktraces as keysTom Zanussi
It's often useful to be able to use a stacktrace as a hash key, for keeping a count of the number of times a particular call path resulted in a trace event, for instance. Add a special key named 'stacktrace' which can be used as key in a 'keys=' param for this purpose: # echo hist:keys=stacktrace ... \ [ if filter] > event/trigger Link: http://lkml.kernel.org/r/87515e90b3785232a874a12156174635a348edb1.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add hist trigger 'syscall' modifierTom Zanussi
Allow users to have syscall id fields displayed as syscall names in the output by appending '.syscall' to field names: # echo hist:keys=aaa.syscall ... \ [ if filter] > event/trigger Link: http://lkml.kernel.org/r/2bab1e59933d76a14b545bd2e02f80b8b08ac4d3.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add hist trigger 'execname' modifierTom Zanussi
Allow users to have common_pid field values displayed as program names in the output by appending '.execname' to a common_pid field name: # echo hist:keys=common_pid.execname ... \ [ if filter] > event/trigger Link: http://lkml.kernel.org/r/e172e81f10f5b8d1f08450e3763c850f39fbf698.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add hist trigger 'sym' and 'sym-offset' modifiersTom Zanussi
Allow users to have address fields displayed as symbols in the output by appending '.sym' or 'sym-offset' to field names: # echo hist:keys=aaa.sym,bbb.sym-offset ... \ [ if filter] > event/trigger Link: http://lkml.kernel.org/r/87d4935821491c0275513f0fbfb9bab8d3d3f079.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add hist trigger 'hex' modifier for displaying numeric fieldsTom Zanussi
Allow users to have numeric fields displayed as hex values in the output by appending '.hex' to field names: # echo hist:keys=aaa,bbb.hex:vals=ccc.hex ... \ [ if filter] > event/trigger Link: http://lkml.kernel.org/r/67bd431edda2af5798d7694818f7e8d71b6b3463.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add hist trigger support for clearing a traceTom Zanussi
Allow users to append 'clear' to an existing trigger in order to have the hash table cleared. This expands the hist trigger syntax from this: # echo hist:keys=xxx:vals=yyy:sort=zzz.descending:pause/cont \ [ if filter] >> event/trigger to this: # echo hist:keys=xxx:vals=yyy:sort=zzz.descending:pause/cont/clear \ [ if filter] >> event/trigger Link: http://lkml.kernel.org/r/ae15dd0d9b2f7af07a37c1ff682063e2dbcdf160.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add hist trigger support for pausing and continuing a traceTom Zanussi
Allow users to append 'pause' or 'continue' to an existing trigger in order to have it paused or to have a paused trace continue. This expands the hist trigger syntax from this: # echo hist:keys=xxx:vals=yyy:sort=zzz.descending \ [ if filter] >> event/trigger to this: # echo hist:keys=xxx:vals=yyy:sort=zzz.descending:pause or cont \ [ if filter] >> event/trigger Link: http://lkml.kernel.org/r/b672a92c14702cb924cdf6fc27ea1809bed04907.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add hist trigger support for user-defined sorting ('sort=' param)Tom Zanussi
Allow users to specify keys and/or values to sort on. With this addition, keys and values specified using the 'keys=' and 'vals=' keywords can be used to sort the hist trigger output via a new 'sort=' keyword. If multiple sort keys are specified, the output will be sorted using the second key as a secondary sort key, etc. The default sort order is ascending; if the user wants a different sort order, '.descending' can be appended to the specific sort key. Before this addition, output was always sorted by 'hitcount' in ascending order. This expands the hist trigger syntax from this: # echo hist:keys=xxx:vals=yyy \ [ if filter] > event/trigger to this: # echo hist:keys=xxx:vals=yyy:sort=zzz.descending \ [ if filter] > event/trigger Link: http://lkml.kernel.org/r/b30a41db66ba486979c4f987aff5fab500ea53b3.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add hist trigger support for compound keysTom Zanussi
Allow users to specify multiple trace event fields to use in keys by allowing multiple fields in the 'keys=' keyword. With this addition, any unique combination of any of the fields named in the 'keys' keyword will result in a new entry being added to the hash table. Link: http://lkml.kernel.org/r/0cfa24e6ac3b0dcece7737d94aa1f322ae3afc4b.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add hist trigger support for multiple values ('vals=' param)Tom Zanussi
Allow users to specify trace event fields to use in aggregated sums via a new 'vals=' keyword. Before this addition, the only aggregated sum supported was the implied value 'hitcount'. With this addition, 'hitcount' is also supported as an explicit value field, as is any numeric trace event field. This expands the hist trigger syntax from this: # echo hist:keys=xxx [ if filter] > event/trigger to this: # echo hist:keys=xxx:vals=yyy [ if filter] > event/trigger Link: http://lkml.kernel.org/r/2a5d1adb5ba6c65d7bb2148e379f2fed47f29a68.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add 'hist' event trigger commandTom Zanussi
'hist' triggers allow users to continually aggregate trace events, which can then be viewed afterwards by simply reading a 'hist' file containing the aggregation in a human-readable format. The basic idea is very simple and boils down to a mechanism whereby trace events, rather than being exhaustively dumped in raw form and viewed directly, are automatically 'compressed' into meaningful tables completely defined by the user. This is done strictly via single-line command-line commands and without the aid of any kind of programming language or interpreter. A surprising number of typical use cases can be accomplished by users via this simple mechanism. In fact, a large number of the tasks that users typically do using the more complicated script-based tracing tools, at least during the initial stages of an investigation, can be accomplished by simply specifying a set of keys and values to be used in the creation of a hash table. The Linux kernel trace event subsystem happens to provide an extensive list of keys and values ready-made for such a purpose in the form of the event format files associated with each trace event. By simply consulting the format file for field names of interest and by plugging them into the hist trigger command, users can create an endless number of useful aggregations to help with investigating various properties of the system. See Documentation/trace/events.txt for examples. hist triggers are implemented on top of the existing event trigger infrastructure, and as such are consistent with the existing triggers from a user's perspective as well. The basic syntax follows the existing trigger syntax. Users start an aggregation by writing a 'hist' trigger to the event of interest's trigger file: # echo hist:keys=xxx [ if filter] > event/trigger Once a hist trigger has been set up, by default it continually aggregates every matching event into a hash table using the event key and a value field named 'hitcount'. To view the aggregation at any point in time, simply read the 'hist' file in the same directory as the 'trigger' file: # cat event/hist The detailed syntax provides additional options for user control, and is described exhaustively in Documentation/trace/events.txt and in the virtual tracing/README file in the tracing subsystem. Link: http://lkml.kernel.org/r/72d263b5e1853fe9c314953b65833c3aa75479f2.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add infrastructure to allow set_event_pid to follow childrenSteven Rostedt
Add the infrastructure needed to have the PIDs in set_event_pid to automatically add PIDs of the children of the tasks that have their PIDs in set_event_pid. This will also remove PIDs from set_event_pid when a task exits This is implemented by adding hooks into the fork and exit tracepoints. On fork, the PIDs are added to the list, and on exit, they are removed. Add a new option called event_fork that when set, PIDs in set_event_pid will automatically get their children PIDs added when they fork, as well as any task that exits will have its PID removed from set_event_pid. This works for instances as well. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-03-24Merge tag 'trace-v4.6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "Nothing major this round. Mostly small clean ups and fixes. Some visible changes: - A new flag was added to distinguish traces done in NMI context. - Preempt tracer now shows functions where preemption is disabled but interrupts are still enabled. Other notes: - Updates were done to function tracing to allow better performance with perf. - Infrastructure code has been added to allow for a new histogram feature for recording live trace event histograms that can be configured by simple user commands. The feature itself was just finished, but needs a round in linux-next before being pulled. This only includes some infrastructure changes that will be needed" * tag 'trace-v4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (22 commits) tracing: Record and show NMI state tracing: Fix trace_printk() to print when not using bprintk() tracing: Remove redundant reset per-CPU buff in irqsoff tracer x86: ftrace: Fix the misleading comment for arch/x86/kernel/ftrace.c tracing: Fix crash from reading trace_pipe with sendfile tracing: Have preempt(irqs)off trace preempt disabled functions tracing: Fix return while holding a lock in register_tracer() ftrace: Use kasprintf() in ftrace_profile_tracefs() ftrace: Update dynamic ftrace calls only if necessary ftrace: Make ftrace_hash_rec_enable return update bool tracing: Fix typoes in code comment and printk in trace_nop.c tracing, writeback: Replace cgroup path to cgroup ino tracing: Use flags instead of bool in trigger structure tracing: Add an unreg_all() callback to trigger commands tracing: Add needs_rec flag to event triggers tracing: Add a per-event-trigger 'paused' field tracing: Add get_syscall_name() tracing: Add event record param to trigger_ops.func() tracing: Make event trigger functions available tracing: Make ftrace_event_field checking functions available ...
2016-03-22kernel/...: convert pr_warning to pr_warnJoe Perches
Use the more common logging method with the eventual goal of removing pr_warning altogether. Miscellanea: - Realign arguments - Coalesce formats - Add missing space between a few coalesced formats Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [kernel/power/suspend.c] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22tracing: Record and show NMI statePeter Zijlstra
The latency tracer format has a nice column to indicate IRQ state, but this is not able to tell us about NMI state. When tracing perf interrupt handlers (which often run in NMI context) it is very useful to see how the events nest. Link: http://lkml.kernel.org/r/20160318153022.105068893@infradead.org Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-03-18tracing: Fix crash from reading trace_pipe with sendfileSteven Rostedt (Red Hat)
If tracing contains data and the trace_pipe file is read with sendfile(), then it can trigger a NULL pointer dereference and various BUG_ON within the VM code. There's a patch to fix this in the splice_to_pipe() code, but it's also a good idea to not let that happen from trace_pipe either. Link: http://lkml.kernel.org/r/1457641146-9068-1-git-send-email-rabin@rab.in Cc: stable@vger.kernel.org # 2.6.30+ Reported-by: Rabin Vincent <rabin.vincent@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>