Brief Introduction
The process cannot start, the software runs suddenly slows down, the program "Segment Fault" and so on each UNIX system user's headache problem, this article through three actual cases demonstrates how to use truss, strace and ltrace these three commonly used debugging tools to quickly diagnose the software " Difficult diseases. "
Truss and strace are used to track the system call or signal generation of a process , while Ltrace is used to track the process call library functions . Truss is a debugger that was developed early for system V R4, and most Unix systems, including Aix and FreeBSD, came with this tool, and Strace was originally written for the SunOS system, and Ltrace first appeared in Gnu/debian Linux. Both tools are now ported to most UNIX systems, and most Linux distributions come with Strace and Ltrace, and FreeBSD can also install them through ports.
Not only can you debug a new starting program from the command line, you can also truss, strace, or ltrace bind to an existing PID to debug a running program. The basic usage of the three debugging tools is roughly the same, and the following are just a few of the three command-line parameters that are common to all three:
-F: In addition to tracking the current process, it also tracks its child processes. -
o File: Writes output information to file files instead of the standard error output (stderr). -
P PID: Binds to a running process that corresponds to a PID. This parameter is commonly used to debug background processes.
With the above three parameters, you can basically do most of the debugging tasks, and here are a few command line examples:
Truss-o Ls.truss Ls-al: Tracks the operation of Ls-al, writes the output information to the file/tmp/ls.truss.
strace-f-o vim.strace vim: Tracks the operation of Vim and its child processes, writing output information to file Vim.strace.
ltrace-p 234: Tracking a process that is already running with a PID of 234.
The output format of the three debugging tools is also similar, taking Strace as an example:
BRK (0) = 0x8062aa8
brk (0x8063000) = 0x8063000 mmap2
(NULL, 4096, Prot_read, Map_private, 3, 0x92f) = 0x40016000
Each row is a system call, and the left side of the equal sign is the function name of the system call and its parameters, and the return value of the call on the right. Truss, Strace and ltrace work the same principle, are using the Ptrace system call to track the process of debugging run, the detailed principle is not in the scope of this article, interested can refer to their source code.
Here are two examples to illustrate how to use these three debugging tools to diagnose the software's "Incurable Diseases": case One: Run Clint segment fault error
Operating system: Freebsd-5.2.1-release
Clint is a C + + static source code analysis tool that is installed by ports after it is run:
# clint Foo.cpp
segmentation fault (core dumped)
Meeting "Segmentation Fault" in a UNIX system is just as annoying as popping the illegal actions dialog box in MS Windows. OK, we use Truss to give Clint "pulse":
# truss-f-O clint.truss Clint
segmentation fault (core dumped)
# tail Clint.truss 739:read
(0x6,0x806f000,0 x1000) = 4096 (0x1000)
739:fstat (6,0xbfbfe4d0) = 0 (0x0)
739:fcntl (0x6,0x3,0x0) = 4 (0x4
) 739:fcntl (0x6,0x4,0x0) = 0 (0x0)
739:close (6) = 0 (0x0)
739:stat ("/root/.clint/plugins", 0xbfbfe680) err#2 ' No such file or directory ' SIGNAL SIGNAL stopped of
:
process exit, Rval = 139
We use truss to track Clint system call execution, output the results to file Clint.truss, and then use tail to view the last few lines. Note that the last system call executed by Clint (line fifth): Stat ("/root/.clint/plugins", 0xbfbfe680) err#2 ' No such file or directory ', the problem is here: Clin T could not find the directory "/root/.clint/plugins", causing a segment error. How to solve. Very simple: mkdir-p/root/.clint/plugins, but this run Clint will still be "segmentation Fault" 9. Continue to use truss tracking, found Clint also need this directory "/root/.clint/plugins/python", built this directory Clint finally able to run normally.
back to the top of the page case Two: Vim startup speed is obviously slowing down
Operating system: Freebsd-5.2.1-release
The VIM version is 6.2.154, and after running vim from the command line, wait nearly half a minute to get into the editing interface without any error output. Check it out. VIMRC and all Vim scripts are not misconfigured, and there is no way to find solutions to similar problems on the web, and it is difficult to hacking source code. There is no need to use truss to find the problem:
# truss-f-d-o Vim.truss vim
Here the-D argument works by adding a relative timestamp before each row of output, that is, the time spent executing a single system call. We just pay attention to which system calls take longer to spend, with less carefully look at the output file Vim.truss, quickly found a doubt:
735:0.000021511 socket (0x2,0x1,0x0) = 4 (0x4) 735:0.000014248 setsockopt (0X4,0X6,0X1,0XBF bfe3c8,0x4) = 0 (0x0) 735:0.000013688 setsockopt (0x4,0xffff,0x8,0xbfbfe2ec,0x4) = 0 (0x0) 735:0.000203657 Connect (0x4,{ Af_inet 10.57.18.27:6000},16) err#61 ' Connection refused ' 735:0.000017042 close (4) = 0 (0x0) 735:1.009366553 n Anosleep (0xbfbfe468,0xbfbfe460) = 0 (0x0) 735:0.000019556 socket (0x2,0x1,0x0) = 4 (0x4) 735:0.000013409 setsockopt (0x4,0x6,0x1,0xbfbfe3c8,0x4) = 0 (0x0) 735:0.000013130 setsockopt (0x4,0xffff,0x8,0xbfbfe2ec,0x4) = 0 (0x0) 735:0.000272
102 Connect (0x4,{af_inet 10.57.18.27:6000},16) err#61 ' Connection refused ' 735:0.000015924 close (4) = 0 (0x0) 735:1.009338338 nanosleep (0xbfbfe468,0xbfbfe460) = 0 (0x0)
Vim attempts to connect to the 10.57.18.27 6000 port (Connect () on line fourth), and after a failed connection, sleep one second to continue retrying (line 6th, Nanosleep ()). The above fragment loop appears more than 10 times, each time consumes a second more than a minute of time, this is the reason that vim obviously slows down. However, you will surely wonder: how can vim connect 6000 ports of other computers for no reason? "。 That's a good question, so please think back to what the 6000 service port is. Yes, that's X Server. It seems vim is going to direct the output to a remote X Server, then the shell definitely defines the DISPLAY variable, view. CSHRC, there is such a line: setenv DISPLAY ${remotehost}:0, comment it out, and then log in again, The problem is solved.
back to the top of the page case Three: Use Debugging tools to master the working principle of the software
Operating system: Red Hat Linux 9.0
Using debugging tools to track the operation of the software in real time is not only an effective means to diagnose the software "difficult and incurable diseases", but also can help us to clarify the "thread" of software, that is, to quickly master the running process and working principle of software, which is an auxiliary method of learning source code. The following case demonstrates how to use Strace to "trigger inspiration" by tracking other software to solve the challenges of software development.
As you all know, there is a single file descriptor (Fd:file descriptor) that corresponds to this file when you open a document within the process. And I was in the development of a software process encountered such a problem: The known one FD, how to obtain this FD the full path of the corresponding file. What if Linux, FreeBSD, or other UNIX systems do not provide such an API? Let's think about it another way: What software is available under UNIX to get what files the process has opened. If you're experienced enough, it's easy to think of lsof, using it to know which files the process has opened, and what process a file is open to.
OK, let's try a little program to test lsof and see how it gets the files that the process has opened.
/* TESTLSOF.C/
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
Include <sys/stat.h>
#include <fcntl.h>
int main (void)
{
open ("/tmp/foo", o_creat| o_rdonly); /* Open file/tmp/foo
/sleep (1200); /* Sleep for 1200 seconds for subsequent operation
/return 0;
}
Put the testlsof into the background run, its PID is 3125. Command Lsof-p 3125 to see which files are open in process 3125, we use Strace to track lsof runs, and the output is saved in Lsof.strace:
# gcc Testlsof.c-o testlsof
#/testlsof &
[1] 3125
# strace-o lsof.strace lsof-p 3125
We use "/tmp/foo" as the keyword search output file lsof.strace, the result is only one:
# grep '/tmp/foo ' lsof.strace
readlink ("/proc/3125/fd/3", "/tmp/foo", 4096) = 8
Originally lsof Clever use of the/proc/nnnn/fd/directory (nnnn for PID): The Linux kernel for each process in/proc/to establish a directory with its PID name to save the information about the process, and its subdirectory FD is saved by the process open all the files of the FD. The target is close to us. Okay, let's go to/proc/3125/fd/and see what happens:
# cd/proc/3125/fd/
# ls-l Total
0
lrwx------ 1 root Nov 5 09:50 0->/dev/pts/0< c7/>lrwx------ 1 root Nov 5 09:50 1->/dev/pts/0
lrwx------ 1 root root< c15/>64 Nov 5 09:50 2->/dev/pts/0
lr-x------ 1 root Nov 5 09:50 3->/tmp/f OO
# READLINK/PROC/3125/FD/3
/tmp/foo
The answer is clear: each fd file in the/proc/nnnn/fd/directory is a symbolic link that points to a file that was opened by the process. We just need to use Readlink () system call can get an FD corresponding file, the code is as follows:
#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#include <unistd.h >
#include <fcntl.h>
#include <sys/stat.h>
int get_pathname_from_fd (int fd, char pathname [], int n)
{
char buf[1024];
pid_t pid;
Bzero (BUF, 1024);
PID = Getpid ();
snprintf (BUF, 1024, "/proc/%i/fd/%i", PID, FD);
Return Readlink (buf, pathname, n);
}
int main (void)
{
int fd;
Char pathname[4096];
Bzero (pathname, 4096);
FD = open ("/tmp/foo", o_creat| o_rdonly);
GET_PATHNAME_FROM_FD (FD, pathname, 4096);
printf ("fd=%d; Pathname=%s\n ", FD, pathname);
return 0;
}
Note For security reasons, the system does not automatically load the proc file system by default after FreeBSD 5, so to use the truss or Strace tracker, you must manually mount the proc file system: Mount-t PROCFS Proc/proc , or add a line to the/etc/fstab:
Proc/ proc procfs rw 0 0
Ltrace does not need to use PROCFS. Resources Truss (1) manual page strace (1) manual page ltrace (1) manual page Ptrace (2) manual page lsof (1) Manu Al page debugging with strace: http://www.devchannel.org/ devtoolschannel/03/10/24/2057246.shtml