# Identifying system performance bottlenecks # strace & ltrace
Strace and ltrace correspond to system calls and library function calls respectively. system calls actually refer to the underlying call. In the linux program design, they refer to the underlying call, it is designed for hardware. Library Function calls are intended for application development, which is equivalent to application APIs. Common lib types include openssl and libxml. The former belongs to the kernel layer, and the latter belongs to the user layer. The corresponding layers are shown below.
The difference between these two concepts is described in reference to advanced programming in UNIX environment: Another example that shows the difference between system calls and library functions is, u n I X provides the interface for determining the current time and date. Some operating systems provide one system call to return the time, and the other to return the date. Any special processing,
For example, the conversion between normal and Daylight Saving Time is processed by the kernel or human intervention is required. U n I X is different, it provides only one
System Call, which returns the number of seconds that have elapsed since, January 1, January 1, The International Standard Time. For this value
Any explanation, such as converting it into readable and using the time and date in the local time zone, is left for the user process to run.
In the standard C library, several routines are provided to handle the majority of cases. These library functions process various details, such as various summer periods
Algorithm.
Applications can call system calls or library functions, while many library functions call system calls. This is in-2
Display.
Another difference between system calls and library functions is that system calls usually provide a minimum interface, while library functions usually
Provides complex functions. We can see from the differences between s B r k system calls and m a l o c library functions.
At this point, when I/O functions without caching are compared in the future (see Chapter 3rd) and standard I/O functions (see Chapter 5th)
You will also see this difference.
A Process Control System Call (fork, exec, and w a I t) is usually called directly by your application (recall
The Basic s h e l in program 1-5 ). However, to simplify some common cases, the u n I X System also provides some libraries.
Functions, such as s y s t e m and p o p e n. Section describes an implementation of the s y s t e m function.
Calls the process control system. Section 1 also reinforces this instance to process signals correctly.
To help readers understand the un I X system interface applied by most programmers, We have to describe system calls and only introduce some library functions.
For example, if only the s B r k system call is described, the m a l o c library function used by many applications will be ignored.
Apart from the need to distinguish between the two, this book uses terminologies (f u n c t I o n) to refer to system calls and library functions.
Generally, processes cannot access the kernel. It cannot access the memory occupied by the kernel or call kernel functions. CPU Hardware determines this (that is why it is called "protection mode "). System calls are an exception to these rules. The principle is that the process first fills the register with appropriate values, and then calls a special command, which will jump to a location in a pre-defined kernel (of course, this location is readable but not writable by the user process ). In Intel CPU, this is implemented by the 0x80 interrupt. The hardware knows that once you jump to this position, you are not a user running in restricted mode, but the operating system kernel-so you can do whatever you want.
The kernel that a process can jump to is called sysem_call. This process checks the system call number, which tells the kernel process which service to request. Then, it looks at the system call table (sys_call_table) and finds the called kernel function entry address. Then, call the function and wait for some system checks to return to the process (or to other processes if the process time is exhausted ). If you want to read this code, it is in the next line of <kernel source code directory>/kernel/entry. S, Entry (system_call.
To prevent confusion with normal return values, the system does not directly return error codes, but puts the error codes in a global variable named errno. If a system call fails, you can read the errno value to confirm the problem.
The error messages represented by different errno values are defined in errno. h. You can also view them by running the "man 3 errno" command.
It should be noted that the errno value is set only when a function error occurs. If the function does not have an error, the errno value is not defined and is not set to 0. In addition, it is best to store the value of errno into another variable before processing errno, because during error handling, the value of errno is changed even when a function like printf () fails.
Well, the theory is almost done. Let's take a look at how to use strace and ltrace to install and directly use commands.
yum install strace ltrace
Then I will give a detailed description of man's command, because man has already written very well. Although it affects reading speed in English, instead of reading the Reprinted data of 3 or 4 hands on the internet, it is better to read the official man document as the document goes through. Here, we extract man's content not to increase the text length, but to force readers to read the official English document.
Man strace
DESCRIPTION
In the simplest case strace runs the specified command until it exits. It intercepts and records the system
Callwhich are called by a process and the signals which are stored ed by a process. The name of each system
Call, its arguments and its return value are printed on standard error or to the file specified with
-O option.
Strace is a useful diagnostic, instructional, and debugging tool. System administrators, diagnosticians and
Trouble-shooters will find it invaluable for solving problems with programs for which the source is not readily
Available since they do not need to be recompiled in order to trace them. Students, hackers and the overly-
Curious will find that a great deal can be learned about a system and its system cballs by tracing even ordinary
Programs. And programmers will find that since system CILS and signals are events that happen at
User/kernel interface, a close examination of this boundary is very useful for bug isolation, sanity checking
And attempting to capture race conditions.
Each line in the trace contains the system call name, followed by its arguments in parentheses and its return
Value. An example from stracing the command ''cat/dev/null'' is:
Open ("/dev/null", O_RDONLY) = 3
Errors (typically a return value of-1) have the errno symbol and error string appended.
Open ("/foo/bar", O_RDONLY) =-1 ENOENT (No such file or directory)
Signals are printed as a signal symbol and a signal string. An excerpt from stracing and interrupting the com-
Mand ''sleep 666 ''is:
Sigsuspend ([]
--- SIGINT (Interrupt )---
++ Killed by SIGINT ++
If a system call is being executed and meanwhile another one is being called from a different thread/process
Then strace will try to preserve the order of those events and mark the ongoing call as being unfinished. When
The call returns it will be marked as resumed.
[Pid 28772] select (4, [3], NULL
[Pid 28779] clock_gettime (CLOCK_REALTIME, {1130322148,939 977000}) = 0
[Pid 28772] <... select resumed>) = 1 (in [3])
Interruption of a (restartable) system call by a signal delivery is processed differently as kernel terminates
The system call and also arranges its immediate reexecution after the signal handler completes.
Read (0, 0x7ffff72cf5cf, 1) =? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) @ 0 (0 )---
Rt_sigreturn (0xe) = 0
Read (0, ""..., 1) = 0
Arguments are printed in symbolic form with a passion. This example shows the shell should Ming ''> xyz''
Output redirection:
Open ("xy1_", O_WRONLY | O_APPEND | O_CREAT, 0666) = 3
Here the three argument form of open is decoded by breaking down the flag argument into its three bitwise-OR
Constituents and printing the mode value in octal by tradition. Where traditional or native usage differs from
ANSI or POSIX, the latter forms are preferred. In some cases, strace output has proven to be more readable
Than the source.
Structure pointers are dereferenced and the members are displayed as appropriate. In all cases arguments are
Formatted in the most C-like fashion possible. For example, the essence of the command ''ls-l/dev/null'' is
Captured:
Lstat ("/dev/null", {st_mode = S_IFCHR | 0666, st_rdev = makedev (1, 3),...}) = 0
Notice how the 'struct stat' argument is dereferenced and how each member is displayed symbolically. In par-
Ticular, observe how the st_mode member is carefully decoded into a bitwise-OR of symbolic and numeric values.
Also notice in this example that the first argument to lstat is an input to the system call and the second
Argument is an output. Since output arguments are not modified if the system call fails, arguments may not
Always be dereferenced. For example, retrying the ''ls-l' example with a non-existent file produces the fol-
Lowing line:
Lstat ("/foo/bar", 0xb004) =-1 ENOENT (No such file or directory)
In this case the porch light is on but nobody is home.
Character pointers are dereferenced and printed as C strings. Non-printing characters in strings are normally
Represented by ordinary C escape codes. Only the first strsize (32 by default) bytes of strings are printed;
Longer strings have an ellipsis appended following the closing quote. Here is a line from ''ls-l' where
Getpwuid library routine is reading the password file:
Read (3, "root: 0: 0: System Administrator:/"..., 1024) = 422
While structures are annotated using curly braces, simple pointers and arrays are printed using square brackets
With commas separating elements. Here is an example from the command ''id'' on a system with supplementary
Group ids:
Getgroups (32, [100, 0]) = 2
On the other hand, bit-sets are also shown using square brackets but set elements are separated only by
Space. Here is the shell preparing to execute an external command:
Sigprocmask (SIG_BLOCK, [chld ttou], []) = 0
Here the second argument is a bit-set of two signals, SIGCHLD and SIGTTOU. In some cases the bit-set is so
Full that printing out the unset elements is more valuable. In that case, the bit-set is prefixed by a tilde
Like this:
Sigprocmask (SIG_UNBLOCK ,~ [], NULL) = 0
Here the second argument represents the full set of all signals.
Now, I have read the man document, and have a rough understanding of strace's functions and usage. The following is an example: 1
[root@localhost ~]# strace lsexecve("/bin/ls", ["ls"], [/* 41 vars */]) = 0brk(0) = 0xe67000mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1176dfa000access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)open("/etc/ld.so.cache", O_RDONLY) = 3fstat(3, {st_mode=S_IFREG|0644, st_size=38574, ...}) = 0mmap(NULL, 38574, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f1176df0000close(3) = 0open("/lib64/libselinux.so.1", O_RDONLY) = 3read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0PX`\312=\0\0\0"..., 832) = 832fstat(3, {st_mode=S_IFREG|0755, st_size=124624, ...}) = 0mmap(0x3dca600000, 2221912, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3dca600000mprotect(0x3dca61d000, 2093056, PROT_NONE) = 0mmap(0x3dca81c000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1c000) = 0x3dca81c000mmap(0x3dca81e000, 1880, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3dca81e000close(3) = 0open("/lib64/librt.so.1", O_RDONLY) = 3
Since the output content is long, I will not copy it all. Let's look at a simple ls command, which System Call outputs are exactly printed? a series of system calls such as execve, mmap, access, open, close, read, and fstat are accessed in detail, take a good look at advanced programming in UNIX environments
Example 2 The-e Option of the strace command is only used to display specific system calls (for example, open, write, etc.), and the-p option is used to specify the pid process.
strace -e 'select' -p 18846Process 18846 attached - interrupt to quitselect(0, NULL, NULL, NULL, {0, 165000}) = 0 (Timeout)select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)select(0, NULL, NULL, NULL, {1, 0} Process 18846 detached
The preceding example shows the select call of the 18846 process. This function allows you to conveniently locate the called method.
Example 3
[root@localhost ~]# strace -o /tmp/a.txt -c hostname localhost[root@localhost ~]# cat /tmp/a.txt % time seconds usecs/call calls errors syscall------ ----------- ----------- --------- --------- ---------------- -nan 0.000000 0 5 read -nan 0.000000 0 1 write -nan 0.000000 0 6 open -nan 0.000000 0 6 close -nan 0.000000 0 7 fstat -nan 0.000000 0 15 mmap -nan 0.000000 0 7 mprotect -nan 0.000000 0 2 munmap -nan 0.000000 0 3 brk -nan 0.000000 0 1 1 access -nan 0.000000 0 1 execve -nan 0.000000 0 1 uname -nan 0.000000 0 1 statfs -nan 0.000000 0 1 arch_prctl------ ----------- ----------- --------- --------- ----------------100.00 0.000000 57 1 total
The-o option is used to output data to a file, and the-c option is used to count the time consumption and number of all system calls. This is also a very practical function.
In addition, the "-t" option is used to display the timestamp. The examples here include the most practical method. The basic method to locate the system bottleneck is to use strace to track the calls of the busiest processes, you can use-c to calculate the number of times, or use-e to specify a function such as open to check the IO call to which file the program is blocked.
The next ltrace is actually similar to strace in ltrace. As mentioned above, ltrace is used to detect library function calls. We should first look at man according to the Convention.
DESCRIPTION ltrace is a program that simply runs the specified command until it exits. It intercepts and records the dynamic library calls which are called by the executed process and the signals which are received by that pro- cess. It can also intercept and print the system calls executed by the program. Its use is very similar to strace(1).
Its man is very short. It is directly similar to strace in Example 1.
[root@localhost ~]# ltrace ls(0, 0, 0, 0x7fc84b52bac0, 88) = 0x3dc8c21160__libc_start_main(0x408480, 1, 0x7fff73282908, 0x412110, 0x412100 strrchr("ls", '/') = NULLsetlocale(6, "") = "zh_CN.UTF-8"bindtextdomain("coreutils", "/usr/share/locale") = "/usr/share/locale"textdomain("coreutils") = "coreutils"__cxa_atexit(0x40bb20, 0, 0, 0x736c6974756572, 0x3dc958fee8) = 0isatty(1) = 1getenv("QUOTING_STYLE") = NULLgetenv("LS_BLOCK_SIZE") = NULLgetenv("BLOCK_SIZE") = NULLgetenv("BLOCKSIZE") = NULLgetenv("POSIXLY_CORRECT") = NULLgetenv("BLOCK_SIZE") = NULLgetenv("COLUMNS") = NULLioctl(1, 21523, 0x7fff732827d0) = 0getenv("TABSIZE") = NULLgetopt_long(1, 0x7fff73282908, "abcdfghiklmnopqrstuvw:xABCDFGHI:"..., 0x619040, 0x7fff732827e8) = -1__errno_location() = 0x7fc84b5296a0malloc(56) = 0x104d050memcpy(0x104d050, "", 56) = 0x104d050
Like strace, use ls for testing first. We can see that many library functions are called, such as strrchr, setlocale, and bindtextdomain. The specific meaning can be searched through the network.
The usage of other options, such as-e \-c \-p \-t, is exactly the same as that of strace. It is equivalent to learning strace and learning ltrace. Therefore, this article will combine these two tools for further discussion. The difference between their output content is a display system call, and a display library function call. You can write a simple c program and use strace and ltrace to test it respectively, it is easy to debug the program according to its output. If you have more than one tool, debugging can be much easier.