Debugging the Linux kernel with Ftrace, part 2nd

Source: Internet
Author: User
Tags echo command stack trace

Ftrace Operations overview

Using the tracker provided by Ftrace to debug or analyze the kernel requires the following actions:

    • Switch to directory/sys/kernel/debug/tracing/
    • View the Available_tracers file for a list of trackers supported by the current kernel
    • Close the Ftrace trace and write 0 to the file tracing_enabled
    • Activate ftrace_enabled, otherwise the function tracker behaves like NOP; In addition, activating this option allows some trackers, such as Irqsoff, to get richer information. It is recommended to use Ftrace when it is activated. To activate ftrace_enabled, you can set it through the proc file system interface:
    Echo 1 >/proc/sys/kernel/ftrace_enabled
    • Writes the name of the selected tracker to the file Current_tracer
    • Write the function to be traced to the file Set_ftrace_filter, and write the function that you do not want to trace to the file set_ftrace_notrace. Usually directly manipulate the file Set_ftrace_filter can be
    • Activates the Ftrace trace and writes the file tracing_enabled to 1. Also make sure that the file tracing_on value is also 1, which can control the pause of the trace
    • If the application is parsed, the execution of the application is initiated, and ftrace tracks the operation of the kernel during application operation.
    • Pausing the record of the trace information by writing 0 to the file tracing_on, when the tracker is still tracking the kernel's operation, but no longer writes trace information to the file trace, or writes 0 to the file tracing_enabled to turn off tracing
    • View file trace Get trace information, analyze and debug the run of the kernel

The use of the tracker and the format of the tracking information are then explained by examples.

Back to top of page

Fucntion Tracking Device

The function tracker can track the invocation of kernel functions, can be used to debug or analyze bugs, and can be used to understand and observe the execution of the Linux kernel. Listing 1 shows an example of using the function tracker.

Listing 1. function Tracker Use Example
 [[email protected] tracing]# pwd/sys/kernel/debug/tracing [[email protected] tracing]# echo 0 > Tracing_e nabled [[email protected] tracing]# echo 1 >/proc/sys/kernel/ftrace_enabled [[email protected] tracing]# echo function > Current_tracer [[email protected] tracing]# echo 1 > tracing_on [[email protected] Traci ng]# echo 1 > tracing_enabled # Let the kernel run for a while so ftrace can collect some trace information and then stop tracking [[email protected] tracing]# echo 0 > t racing_enabled [[email protected] tracing]# cat Trace |       HEAD-10 # tracer:function # # Task-pid cpu# TIMESTAMP function # | |          |         |           |  <idle>-0 [20654.426521]: _raw_spin_lock <-scheduler_tick <idle>-0 [000] 20654.426522:           Task_tick_idle <-scheduler_tick <idle>-0 [20654.426522:cpumask_weight <-scheduler_tick] <idle>-0 [20654.426523:cpumask_weight] <-Scheduler_tick <idle>-0 [20654.426523:run_posix_cpu_timers] <-update_process_times &LT;IDLE&G t;-0 [20654.426524:hrtimer_forward <-tick_sched_timer]

The information format given by the trace file is clear. First, the field "Tracer:" gives the name of the tracker currently in use, which is the function tracker. Then is the tracking information record format, the task field corresponds to the name of the job, the PID field gives the process ID of the task, the field cpu# represents the CPU number of the running tracked function, here you can see the idle process running on the No. 0 CPU, its process ID is 0; field Timesta The MP is a timestamp, which is in the form of "<secs>.<usecs> , which indicates the time stamp corresponding to the function being executed, and the function column gives the functions being traced, and the caller of the function is marked by the symbol" <-". This allows you to observe the invocation relationship of the function.

Back to top of page

Function_graph Tracking Device

In the information given by the function tracker, it is possible to see the functions call relationship through the symbol "<-" in the functionality column, but because the middle will be mixed with different functions of the call, resulting in a very confusing, very inconvenient. The Function_graph Tracker can provide function call relationship information similar to C code. By writing the file set_graph_function you can display the function that specifies the call relationship to generate, and the missing capital generates a function call graph for all traceable kernel functions. Listing 2 shows an example of using the Function_grapch Tracker, which __do_fault the kernel function as an observation object that is frequently called during kernel operation.

Listing 2. Function_graph Tracker Use Example
 [[email protected] tracing]# pwd/sys/kernel/debug/tracing [[email protected] tracing]# echo 0 > Tracing_e nabled [[email protected] tracing]# echo 1 >/proc/sys/kernel/ftrace_enabled [[email protected] tracing]# echo function_graph > Current_tracer [[email protected] tracing]# echo __do_fault > set_graph_function [[Emai L protected] tracing]# echo 1 > tracing_on [[email protected] tracing]# echo 1 > tracing_enabled # let the kernel run For a while, so ftrace can collect some trace information and then stop tracking [[email protected] tracing]# echo 0 > tracing_enabled [[email protected] tracing]# Cat Trace |     head-20 # tracer:function_graph # # CPU DURATION function CALLS # |   |                     |   |   |   |  |    1) 9.936 US |    } 1) 0.714 US |  Put_prev_task_fair ();    1) |      Pick_next_task_fair () {1) |        Set_next_entity () {1) 0.647 US |  Update_stats_wait_end (); 1) 0.699 US |       __dequeue_entity ();      1) 3.322 US |      } 1) 0.865 US |  Hrtick_start_fair ();    1) 6.313 US |    } 1) |      __switch_to_xtra () {1) 1.601 US |  memcpy ();    1) 3.938 US | } [[email protected] tracing]# echo > Set_graph_function

In the output information of the file trace, the first given is also the name of the current tracker, here is function_graph. Next is the detailed trace information, as you can see, the call relationship of the function is organized in the form of C code.

The CPU field gives the CPU number of the executing function, which in this case is number 1th CPU. The DURATION field gives the length of time the function executes, in us. function CALLS gives the functions called and shows the invocation process. Note that for functions that do not call other functions, their corresponding rows are ";" End, and the corresponding DURATION field gives its run-time, and for functions that call other functions, the run-time length is given in its "}" corresponding line, which is an accumulative value, including the execution time of the function it calls internally. The length of time given by the DURATION field is not precise, it also contains the time spent executing the code of the Ftrace itself, and the result of the internal function's duration in the example is inconsistent with the execution time of the corresponding perimeter call function However, this field still gives you a general idea of how much the function spends on time. The file set_graph_function was finally reset in Listing 2 with the echo command.

Back to top of page

Sched_switch Tracking Device

The Sched_switch Tracker can track the scheduling of a process and the wake-up operation between them, as shown in Listing 3.

Listing 3. Sched_switch Tracker Use Example
 [[email protected] tracing]# pwd/sys/kernel/debug/tracing [[email protected] tracing]# echo 0 > Tracing_e nabled [[email protected] tracing]# echo 1 >/proc/sys/kernel/ftrace_enabled [[email protected] tracing]# echo sched_switch > Current_tracer [[email protected] tracing]# echo 1 > tracing_on [[email protected] t racing]# echo 1 > tracing_enabled # Let the kernel run for a while so ftrace can collect some trace information and then stop tracking [[email protected] tracing]# Echo 0 &G T tracing_enabled [[email protected] tracing]# cat Trace |       HEAD-10 # tracer:sched_switch # # Task-pid cpu# TIMESTAMP FUNCTION # | |          |         |       |   bash-1408 [26208.816058:1408:120:s] 1408:120:s bash bash-1408 [26208.816070:1408:120:s]   + [1408:120:s bash] bash-1408 [26208.816921:1408:120:r +] 9:120:r events/0 bash-1408 [26208.816939:1408:120:r ==>] 9:120:r events/0 Events/0-9 [26208.817081:9:120:r] 1377:120:r gnome-terminal events/0-9 [000] 26208.817088:9: 120:s ==> [1377:120:r] Gnome-terminal

In the tracking information obtained by the Sched_swich tracker, the wake-up operation and the scheduling switching information are recorded, which can be distinguished by the symbol ' + ' and ' ==> ', and the wake Operation record shows the process of the current process wake-up operation, and the process scheduling switch record indicates the successor process that takes over the current process.

The format for describing the status of the process is "task-pid:priority:task-state". As an example of the first trace in the tracking information, you can see that the process bash PID is 1408, its corresponding kernel state priority is 120, the current state is S (can interrupt the sleep state), the current bash does not wake other processes; from the 3rd record you can see that the process bash Events/0 wakes up while the process is scheduled in the 4th record, the process bash switches to process events/0 execution.

In the Linux kernel, the status of the process is defined in the kernel header file Include/linux/sched.h, including the operational status task_running (corresponding to the symbol ' R ' in the trace information), the interruptible blocking state task_ Interruptible (corresponding to the symbol ' S ' in the tracking information), etc. The header also defines the range of priorities used by the user-state process, with a minimum value of Max_user_rt_prio (a value of 100), a maximum value of max_prio-1 (corresponding to 139), and a default of Default_prio (a value of 120); In this case, the process takes precedence Level is the default value of 120.

Back to top of page

Irqsoff Tracking Device

When the interrupt is turned off, the CPU delays the response to changes in the state of the device, and sometimes this can have a significant impact on system performance. The Irqsoff tracker tracks the state of the interrupt being turned off, helping to find code that causes a large delay, and when the maximum delay occurs, the tracker records the trace information that caused the delay, and the file Tracing_max_latency records the maximum delay when the interrupt is closed.

Listing 4. Irqsoff Tracker Use Example
 [[email protected] tracing]# pwd/sys/kernel/debug/tracing [[email protected] tracing]# echo 0 > Tracing_e nabled [[email protected] tracing]# echo 1 >/proc/sys/kernel/ftrace_enabled [[email protected] tracing]# echo Irqsoff > Current_tracer [[email protected] tracing]# echo 1 > tracing_on [[email protected] Tracin g]# echo 1 > tracing_enabled # Let the kernel run for a while so ftrace can collect some trace information and then stop tracking [[email protected] tracing]# echo 0 > TR acing_enabled [[email protected] tracing]# cat Trace | HEAD-35 # tracer:irqsoff # # Irqsoff Latency Trace v1.1.5 on #-------------------------------------------- ------------------------# latency:34380 US, #6/6, cpu#1 | (M:desktop vp:0, kp:0, sp:0 hp:0 #P: 2) #-----------------# | Task:-0 (uid:0 nice:0 policy:0 rt_prio:0) #-----------------# = started At:reschedule_interrupt # = en   Ded At:restore_all_notrace # # # _------= cpu#          #/_-----= Irqs-off # | /_----= need-resched # | | /_---= Hardirq/softirq # | | | /_--=> preempt-depth # | | | | /_--=> lock-depth # | | | | | /delay # cmd pid | | | | | |   Time |   Caller # \ | | | | | | |   \   | /<idle>-0 1dN ... 4285us!: Trace_hardirqs_off_thunk <-reschedule_interrupt <idle>-0 1dN ... 34373us+: Smp_reschedule_interrupt <-reschedule_interrupt <idle>-0 1dN ... 34375us+: Native_apic_mem_write <-smp_reschedule_interrupt <idle>-0 1dN ... 34380us+: Trace_hardirqs_on_thunk <-restore_all_notrace <idle>-0 1dN ... 34384us:trace_hardirqs_on_caller <-restore_all_notrace <idle>-0 1dN ... 34384us: <stack trace> = Trace_hardirqs_on_thunk [[email protected] tracing]# cat tracing_max_latency 34380 

From the output information in Listing 4, you can see the version information for the current Irqsoff delay tracker. The next is the maximum delay time, in us, in this case 34380 us, and viewing the file Tracing_max_latency can also get the value. You can tell from the "task:" field that the process that is running when the delay occurs is idle (its PID is 0). The closed operation of the interrupt is performed in the function Reschedule_interrupt by the "= = started at:" identifier, the function Restore_all_ontrace reactivated the interrupt, identified by "= = ended at:" The maximum delay for interrupt shutdown occurs in the function Trace_hardirqs_on_thunk, which can be seen from the record entry where the maximum delay time is located, or from the record line that corresponds to the last "= =" identifier in the deferred record information.

In the output information, fields such as Irqs-off, need_resched, etc. correspond to the fields or status flags of the process structure struct task_struct, which can be arch/<platform>/include/asm/from the scratch file. Thread_info.h to see the status flags supported by the process, include/linux/sched.h defines the struct struct task_struct. Where the Irqs-off field shows whether the interrupt is disabled, the ' d ' indicates that the interrupt is forbidden, and the Need_resched field is displayed as ' N ' indicating that the process status flag is set tif_need_resched. The Hardirq/softirq field indicates whether a hardware interrupt/soft interrupt is currently occurring, preempt-depth indicates whether a process preemption is prohibited, for example, if the process cannot be preempted in the case of a spin lock, in this case the process idle can be preempted by other processes. The lock_depth field in struct struct task_struct is associated with a large kernel lock, and the most recent kernel development work is related to removing a large kernel lock, which is no longer described.

In addition, there are tracker preemptoff for tracking forbidden process preemption and tracking interrupt/preemption disabled tracker Preemptirqsoff, which are used in the same way as the output information format, similar to the Irqsoff tracker, and no longer repeat.

Back to top of page

Trace functions in a specified module

As mentioned earlier, through file set_ftrace_filter you can specify the function to trace, the default target is all traceable kernel functions, you can write the function of interest to the file through ECHO. For ease of use, the Set_ftrace_filter file also supports simple format wildcard characters.

    • begin* Select all functions whose names begin with the Begin string
    • *middle* Select all functions that contain the middle string in the name
    • * end Selects all functions whose names end with the end string

It is important to note that these three forms cannot be combined, such as "begin*middle*end" the actual effect is the same as "begin". In addition, when you use a wildcard expression, you enclose it in single quotation marks, and if you use double quotes, the shell may extend the character ' * ' so that the object that is eventually tracked may not be the same as the target function.

The file also allows you to specify functions that belong to a specific module, which uses the MoD directive. The format of the specified module is:

        Echo ': Mod:[module_name] ' > Set_ftrace_filter

An example of specifying a function in the trace module IPv6 is given below. As you can see, specifying a function in the trace module IPv6 sets the contents of the file Set_ftrace_filter to include only the functions in that module.

Listing 5. Specifying functions in the trace IPv6 module
[[email protected] tracing]# pwd  /sys/kernel/debug/tracing  [[email protected] tracing]# echo ': Mod:ipv6 ' > Set_ftrace_filter  [[email protected] tracing]# Cat Set_ftrace_filter | head-5  ipv6_opt_accepted  inet6_ Net_exit  ipv6_gro_complete  inet6_create  ipv6_addr_copy

Back to top of page


This series of articles describes the configuration and use of Ftrace. This article is the second part, introduced the use of the tracker provided by Ftrace, focused on the different formats of tracking information described, and how to track the function in the specified module is introduced. The current Linux kernel Ftrace supports multiple trackers, which cannot be described in its entirety due to space limitations. Readers can learn more about the Ftrace tracker in the light of the information given in the resources. The next article describes how to interact with ftrace using the tool functions provided by Ftrace in your code.

Debugging the Linux kernel with Ftrace, part 2nd

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.