diff options
Diffstat (limited to 'Documentation/trace/events.rst')
-rw-r--r-- | Documentation/trace/events.rst | 139 |
1 files changed, 92 insertions, 47 deletions
diff --git a/Documentation/trace/events.rst b/Documentation/trace/events.rst index 2a5aa48eff6c..759907c20e75 100644 --- a/Documentation/trace/events.rst +++ b/Documentation/trace/events.rst @@ -24,27 +24,27 @@ tracing information should be printed. --------------------------------- The events which are available for tracing can be found in the file -/sys/kernel/debug/tracing/available_events. +/sys/kernel/tracing/available_events. To enable a particular event, such as 'sched_wakeup', simply echo it -to /sys/kernel/debug/tracing/set_event. For example:: +to /sys/kernel/tracing/set_event. For example:: - # echo sched_wakeup >> /sys/kernel/debug/tracing/set_event + # echo sched_wakeup >> /sys/kernel/tracing/set_event .. Note:: '>>' is necessary, otherwise it will firstly disable all the events. To disable an event, echo the event name to the set_event file prefixed with an exclamation point:: - # echo '!sched_wakeup' >> /sys/kernel/debug/tracing/set_event + # echo '!sched_wakeup' >> /sys/kernel/tracing/set_event To disable all events, echo an empty line to the set_event file:: - # echo > /sys/kernel/debug/tracing/set_event + # echo > /sys/kernel/tracing/set_event To enable all events, echo ``*:*`` or ``*:`` to the set_event file:: - # echo *:* > /sys/kernel/debug/tracing/set_event + # echo *:* > /sys/kernel/tracing/set_event The events are organized into subsystems, such as ext4, irq, sched, etc., and a full event name looks like this: <subsystem>:<event>. The @@ -53,29 +53,29 @@ file. All of the events in a subsystem can be specified via the syntax ``<subsystem>:*``; for example, to enable all irq events, you can use the command:: - # echo 'irq:*' > /sys/kernel/debug/tracing/set_event + # echo 'irq:*' > /sys/kernel/tracing/set_event 2.2 Via the 'enable' toggle --------------------------- -The events available are also listed in /sys/kernel/debug/tracing/events/ hierarchy +The events available are also listed in /sys/kernel/tracing/events/ hierarchy of directories. To enable event 'sched_wakeup':: - # echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable + # echo 1 > /sys/kernel/tracing/events/sched/sched_wakeup/enable To disable it:: - # echo 0 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable + # echo 0 > /sys/kernel/tracing/events/sched/sched_wakeup/enable To enable all events in sched subsystem:: - # echo 1 > /sys/kernel/debug/tracing/events/sched/enable + # echo 1 > /sys/kernel/tracing/events/sched/enable To enable all events:: - # echo 1 > /sys/kernel/debug/tracing/events/enable + # echo 1 > /sys/kernel/tracing/events/enable When reading one of these enable files, there are four results: @@ -126,7 +126,7 @@ is the size of the data item, in bytes. For example, here's the information displayed for the 'sched_wakeup' event:: - # cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/format + # cat /sys/kernel/tracing/events/sched/sched_wakeup/format name: sched_wakeup ID: 60 @@ -198,6 +198,41 @@ The glob (~) accepts a wild card character (\*,?) and character classes prev_comm ~ "*sh*" prev_comm ~ "ba*sh" +If the field is a pointer that points into user space (for example +"filename" from sys_enter_openat), then you have to append ".ustring" to the +field name:: + + filename.ustring ~ "password" + +As the kernel will have to know how to retrieve the memory that the pointer +is at from user space. + +You can convert any long type to a function address and search by function name:: + + call_site.function == security_prepare_creds + +The above will filter when the field "call_site" falls on the address within +"security_prepare_creds". That is, it will compare the value of "call_site" and +the filter will return true if it is greater than or equal to the start of +the function "security_prepare_creds" and less than the end of that function. + +The ".function" postfix can only be attached to values of size long, and can only +be compared with "==" or "!=". + +Cpumask fields or scalar fields that encode a CPU number can be filtered using +a user-provided cpumask in cpulist format. The format is as follows:: + + CPUS{$cpulist} + +Operators available to cpumask filtering are: + +& (intersection), ==, != + +For example, this will filter events that have their .target_cpu field present +in the given cpumask:: + + target_cpu & CPUS{17-42} + 5.2 Setting filters ------------------- @@ -206,19 +241,19 @@ to the 'filter' file for the given event. For example:: - # cd /sys/kernel/debug/tracing/events/sched/sched_wakeup + # cd /sys/kernel/tracing/events/sched/sched_wakeup # echo "common_preempt_count > 4" > filter A slightly more involved example:: - # cd /sys/kernel/debug/tracing/events/signal/signal_generate + # cd /sys/kernel/tracing/events/signal/signal_generate # echo "((sig >= 10 && sig < 15) || sig == 17) && comm != bash" > filter If there is an error in the expression, you'll get an 'Invalid argument' error when setting it, and the erroneous string along with an error message can be seen by looking at the filter e.g.:: - # cd /sys/kernel/debug/tracing/events/signal/signal_generate + # cd /sys/kernel/tracing/events/signal/signal_generate # echo "((sig >= 10 && sig < 15) || dsig == 17) && comm != bash" > filter -bash: echo: write error: Invalid argument # cat filter @@ -230,6 +265,16 @@ Currently the caret ('^') for an error always appears at the beginning of the filter string; the error message should still be useful though even without more accurate position info. +5.2.1 Filter limitations +------------------------ + +If a filter is placed on a string pointer ``(char *)`` that does not point +to a string on the ring buffer, but instead points to kernel or user space +memory, then, for safety reasons, at most 1024 bytes of the content is +copied onto a temporary buffer to do the compare. If the copy of the memory +faults (the pointer points to memory that should not be accessed), then the +string compare will be treated as not matching. + 5.3 Clearing filters -------------------- @@ -239,7 +284,7 @@ file. To clear the filters for all events in a subsystem, write a '0' to the subsystem's filter file. -5.3 Subsystem filters +5.4 Subsystem filters --------------------- For convenience, filters for every event in a subsystem can be set or @@ -258,7 +303,7 @@ above points: Clear the filters on all events in the sched subsystem:: - # cd /sys/kernel/debug/tracing/events/sched + # cd /sys/kernel/tracing/events/sched # echo 0 > filter # cat sched_switch/filter none @@ -268,7 +313,7 @@ Clear the filters on all events in the sched subsystem:: Set a filter using only common fields for all events in the sched subsystem (all events end up with the same filter):: - # cd /sys/kernel/debug/tracing/events/sched + # cd /sys/kernel/tracing/events/sched # echo common_pid == 0 > filter # cat sched_switch/filter common_pid == 0 @@ -279,14 +324,14 @@ Attempt to set a filter using a non-common field for all events in the sched subsystem (all events but those that have a prev_pid field retain their old filters):: - # cd /sys/kernel/debug/tracing/events/sched + # cd /sys/kernel/tracing/events/sched # echo prev_pid == 0 > filter # cat sched_switch/filter prev_pid == 0 # cat sched_wakeup/filter common_pid == 0 -5.4 PID filtering +5.5 PID filtering ----------------- The set_event_pid file in the same directory as the top events directory @@ -294,7 +339,7 @@ exists, will filter all events from tracing any task that does not have the PID listed in the set_event_pid file. :: - # cd /sys/kernel/debug/tracing + # cd /sys/kernel/tracing # echo $$ > set_event_pid # echo 1 > events/enable @@ -390,14 +435,14 @@ The following commands are supported: specifies that this enablement happens only once:: # echo 'enable_event:kmem:kmalloc:1' > \ - /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger + /sys/kernel/tracing/events/syscalls/sys_enter_read/trigger The following trigger causes kmalloc events to stop being traced when a read system call exits. This disablement happens on every read system call exit:: # echo 'disable_event:kmem:kmalloc' > \ - /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger + /sys/kernel/tracing/events/syscalls/sys_exit_read/trigger The format is:: @@ -407,10 +452,10 @@ The following commands are supported: To remove the above commands:: # echo '!enable_event:kmem:kmalloc:1' > \ - /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger + /sys/kernel/tracing/events/syscalls/sys_enter_read/trigger # echo '!disable_event:kmem:kmalloc' > \ - /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger + /sys/kernel/tracing/events/syscalls/sys_exit_read/trigger Note that there can be any number of enable/disable_event triggers per triggering event, but there can only be one trigger per @@ -429,13 +474,13 @@ The following commands are supported: kmalloc tracepoint is hit:: # echo 'stacktrace' > \ - /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + /sys/kernel/tracing/events/kmem/kmalloc/trigger The following trigger dumps a stacktrace the first 5 times a kmalloc request happens with a size >= 64K:: # echo 'stacktrace:5 if bytes_req >= 65536' > \ - /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + /sys/kernel/tracing/events/kmem/kmalloc/trigger The format is:: @@ -444,16 +489,16 @@ The following commands are supported: To remove the above commands:: # echo '!stacktrace' > \ - /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + /sys/kernel/tracing/events/kmem/kmalloc/trigger # echo '!stacktrace:5 if bytes_req >= 65536' > \ - /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + /sys/kernel/tracing/events/kmem/kmalloc/trigger The latter can also be removed more simply by the following (without the filter):: # echo '!stacktrace:5' > \ - /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + /sys/kernel/tracing/events/kmem/kmalloc/trigger Note that there can be only one stacktrace trigger per triggering event. @@ -469,20 +514,20 @@ The following commands are supported: capture those events when the trigger event occurred:: # echo 'snapshot if nr_rq > 1' > \ - /sys/kernel/debug/tracing/events/block/block_unplug/trigger + /sys/kernel/tracing/events/block/block_unplug/trigger To only snapshot once:: # echo 'snapshot:1 if nr_rq > 1' > \ - /sys/kernel/debug/tracing/events/block/block_unplug/trigger + /sys/kernel/tracing/events/block/block_unplug/trigger To remove the above commands:: # echo '!snapshot if nr_rq > 1' > \ - /sys/kernel/debug/tracing/events/block/block_unplug/trigger + /sys/kernel/tracing/events/block/block_unplug/trigger # echo '!snapshot:1 if nr_rq > 1' > \ - /sys/kernel/debug/tracing/events/block/block_unplug/trigger + /sys/kernel/tracing/events/block/block_unplug/trigger Note that there can be only one snapshot trigger per triggering event. @@ -500,20 +545,20 @@ The following commands are supported: trigger event:: # echo 'traceoff:1 if nr_rq > 1' > \ - /sys/kernel/debug/tracing/events/block/block_unplug/trigger + /sys/kernel/tracing/events/block/block_unplug/trigger To always disable tracing when nr_rq > 1:: # echo 'traceoff if nr_rq > 1' > \ - /sys/kernel/debug/tracing/events/block/block_unplug/trigger + /sys/kernel/tracing/events/block/block_unplug/trigger To remove the above commands:: # echo '!traceoff:1 if nr_rq > 1' > \ - /sys/kernel/debug/tracing/events/block/block_unplug/trigger + /sys/kernel/tracing/events/block/block_unplug/trigger # echo '!traceoff if nr_rq > 1' > \ - /sys/kernel/debug/tracing/events/block/block_unplug/trigger + /sys/kernel/tracing/events/block/block_unplug/trigger Note that there can be only one traceon or traceoff trigger per triggering event. @@ -798,13 +843,13 @@ To trace a synthetic using the piecewise method described above, the synth_event_trace_start() function is used to 'open' the synthetic event trace:: - struct synth_trace_state trace_state; + struct synth_event_trace_state trace_state; ret = synth_event_trace_start(schedtest_event_file, &trace_state); It's passed the trace_event_file representing the synthetic event using the same methods as described above, along with a pointer to a -struct synth_trace_state object, which will be zeroed before use and +struct synth_event_trace_state object, which will be zeroed before use and used to maintain state between this and following calls. Once the event has been opened, which means space for it has been @@ -816,7 +861,7 @@ lookup per field. To assign the values one after the other without lookups, synth_event_add_next_val() should be used. Each call is passed the -same synth_trace_state object used in the synth_event_trace_start(), +same synth_event_trace_state object used in the synth_event_trace_start(), along with the value to set the next field in the event. After each field is set, the 'cursor' points to the next field, which will be set by the subsequent call, continuing until all the fields have been set @@ -845,7 +890,7 @@ this method would be (without error-handling code):: ret = synth_event_add_next_val(395, &trace_state); To assign the values in any order, synth_event_add_val() should be -used. Each call is passed the same synth_trace_state object used in +used. Each call is passed the same synth_event_trace_state object used in the synth_event_trace_start(), along with the field name of the field to set and the value to set it to. The same sequence of calls as in the above examples using this method would be (without error-handling @@ -867,7 +912,7 @@ can be used but not both at the same time. Finally, the event won't be actually traced until it's 'closed', which is done using synth_event_trace_end(), which takes only the -struct synth_trace_state object used in the previous calls:: +struct synth_event_trace_state object used in the previous calls:: ret = synth_event_trace_end(&trace_state); @@ -884,7 +929,7 @@ functions can be used. To create a kprobe event, an empty or partially empty kprobe event should first be created using kprobe_event_gen_cmd_start(). The name -of the event and the probe location should be specfied along with one +of the event and the probe location should be specified along with one or args each representing a probe field should be supplied to this function. Before calling kprobe_event_gen_cmd_start(), the user should create and initialize a dynevent_cmd object using @@ -964,7 +1009,7 @@ The basic idea is simple and amounts to providing a general-purpose layer that can be used to generate trace event commands. The generated command strings can then be passed to the command-parsing and event creation code that already exists in the trace event -subystem for creating the corresponding trace events. +subsystem for creating the corresponding trace events. In a nutshell, the way it works is that the higher-level interface code creates a struct dynevent_cmd object, then uses a couple @@ -1037,7 +1082,7 @@ to add an operator between the pair (here none) and a separator to be appended onto the end of the arg pair (here ';'). There's also a dynevent_str_add() function that can be used to simply -add a string as-is, with no spaces, delimeters, or arg check. +add a string as-is, with no spaces, delimiters, or arg check. Any number of dynevent_*_add() calls can be made to build up the string (until its length surpasses cmd->maxlen). When all the arguments have |