Linux debugging technology

Source: Internet
Author: User
Tags stack trace rsync

Find and solve program errors in Linux

Steve best (sbest@us.ibm.com)
JFS core team member, IBM

You can use various methods to monitor the running user space program: You can run the debugger for it and debug the program in one step,
Add print statements or add tools to analyze programs. This article describes several types
Program Method. We will review the four debugging problems, including segment errors, memory overflow and leakage, and
Suspended.
This article discusses four types of Linux program debugging. In the 1st cases, we used two questions with memory allocation
Sample program of the question, debug them using memwatch and yet another malloc debugger (yamd) Tools
. In the 2nd cases, we use the strace utility in Linux, which can track system calls and
To find out where the program is wrong. In 3rd cases, we use the Linux kernel's Oops Function
And shows you how to set the kernel source level
Bugger, kgdb) to use the GNU Debugger (GNU Debugger, GDB) to solve the same problem; kgdb Program
Is a Linux kernel that uses a serial connection to remotely execute GDB. In 4th cases, we use the magic on Linux
The magic key sequence command is used to display information about the components that cause the suspension problem.

Common debugging methods
When your program contains errors, it is likely that there is a condition somewhere in the code. You think it is true,
Is false ). The process of finding out the error is to overturn a condition that was always believed to be true after finding the error.
Process.

The following examples are some types of conditions that you may be certain about:

Somewhere in the source code, a variable has a specific value.
A structure has been correctly set in a given place.
For the given if-then-else statement, the IF part is the path to be executed.
When a subroutine is called, it receives its parameters correctly.

Identify the error, that is, determine whether all the above conditions exist. If you are sure that a variable should have
For specific values, check whether this is the case. If you believe that the if structure will be executed, check the structure.
This is the case. Generally, your assumptions are correct, but eventually you will find a situation that does not match the assumptions. Result,
You will find out where an error occurs.

Debugging is a task that you cannot escape. There are many debugging methods, such as printing messages to the screen and using the debugger.
, Or just consider the execution of the program and carefully investigate the problem.

Before you fix the problem, you must find its source. For example, for a segment error, you need to know that the segment error occurs in
The line of the Code. Once you find an error line in the code, determine the value of the variable in the method and the method to be called.
And details about how an error occurs. Using the debugger makes it easy to find all this information. If
No debugger is available. You can also use other tools. (Note that the debugger may not be provided in the product environment,
In addition, the Linux kernel does not have a built-in debugger .)

Practical memory and kernel tools
You can use the debugging tool on Linux to track user space and kernel problems in various ways. Please use the following
Use and technology to build and debug your source code:
User space tools:

Memory tools: memwatch and yamd
Strace
GNU Debugger (GDB)
Magic key sequence

Kernel tools:

Kernel source code-level debugger (kgdb)
Built-in kernel debugger (KDB)
Oops

This article will discuss a type of problem that is hard to find by manually checking the code, and such problems are stored only in rare cases.
In. Memory Errors usually occur in multiple cases at the same time, and sometimes you can only find the internal
Storage error.

1st cases: Memory debugging tools
As a standard programming language in Linux, C language gives us great control over dynamic memory allocation. However,
This kind of freedom may cause serious memory management problems, which may cause program crashes or lead over time
Performance downgrade.

Memory leakage (that is, there is a corresponding free () in the malloc (), the call will never be released after execution) and Buffer Overflow (for example
For example, writing the memory previously allocated to an array) is a common problem, which may be difficult to detect. This
This section discusses several debugging tools that greatly simplify the process of detecting and identifying memory problems.

Memwatch
Memwatch, written by Johan Lindh, is an open-source C-language memory error detection tool. You can
Download it (see references later in this article ). You only need to add a header file to the code and add it to the GCC statement.
After memwatch is defined, you can track memory leaks and errors in the program. Memwatch supports ANSI C
It provides results log records and can detect double-free, erroneous free ),
Unfreed memory, overflow, underflow, and so on.

Listing 1. Memory sample (test1.c)
# Include
# Include
# Include "memwatch. H"

Int main (void)
{
Char * ptr1;
Char * ptr2;

Ptr1. = malloc (512 );
Ptr2 = malloc (512 );

Ptr2 = ptr1;
Free (ptr2 );
Free (ptr1 );
}

 

The code in Listing 1 allocates two 512-byte memory blocks, and the pointer to the first memory block is set
To the second memory block. As a result, the address of the second memory block is lost, resulting in Memory leakage.

Now let's compile memwatch. c In listing 1. The following is a makefile example:

Test1
Gcc-dmemwatch-dmw_stdio test1.c memwatch
C-o test1

 

When you run the test1 program, it generates a report about the leaked memory. Listing 2 shows the example memwa
Tch. log output file.

Listing 2. test1 memwatch. Log File
Memwatch 2.67 copyright (c) 1992-1999 Johan Lindh

...
Double-free: <4> test1.c (15), 0x80517b4 was freed from test1.c (14)
...
Unfreed: <2> test1.c (11), 512 bytes at 0x80519e4
{Fe ..............}

Memory usage statistics (global ):
N) umber of allocations made: 2
L) argest memory usage: 1024
T) otal of all alloc () cballs: 1024
U) nfreed bytes totals: 512

 

Memwatch shows you the rows that actually cause problems. If you release a released pointer, it will tell you. Pair
The same is true for memory not released. Statistics are displayed at the end of the log, including how much memory is leaked and how much memory is used.
Memory, and the total amount of memory allocated.

Yamd
The yamd software package is compiled by Nate Eldredge. You can find dynamic questions in C and C ++ related to memory allocation.
Question. At the time of writing this article, yamd's latest version is 0.32. Please download the yamd-0.32.tar.gz (see references
). Run the make command to build the program. Then run the make install command to install the program and set the tool.

Once you download yamd, use it on test1.c. Delete # include memwatch. h and
Make the following small changes to akefile:

Use test1 of yamd
Gcc-G test1.c-O test1

 

Listing 3 shows the output from yamd on test1.

Listing 3. Using test1 output of yamd
Yamd version 0.32
Executable:/usr/src/test/yamd-0.32/test1
...
Info: normal allocation of this block
Address 0x40025e00, size 512
...
Info: normal allocation of this block
Address 0x40028e00, size 512
...
Info: normal deallocation of this block
Address 0x40025e00, size 512
...
Error: Multiple freeing
Free of pointer already freed
Address 0x40025e00, size 512
...
Warning: Memory Leak
Address 0x40028e00, size 512
Warning: total memory leaks:
1 unfreed allocations totaling 512 bytes

* ** Finished at Tue... 10:07:15 2002
Allocated a grand total of 1024 bytes 2 allocations
Average of 512 bytes per allocation
Maxbytes allocated at one time: 1024
24 k alloced internally/12 K mapped now/8 K max
Virtual program size is 1416 K
End.

 

Yamd shows that we have released the memory and Memory leakage exists. Let's go to another sample program in Listing 4.
Try yamd.

Listing 4. Memory code (test2.c)
# Include
# Include

Int main (void)
{
Char * ptr1;
Char * ptr2;
Char * chptr;
Int I = 1;
Ptr1. = malloc (512 );
Ptr2 = malloc (512 );
Chptr = (char *) malloc (512 );
For (I; I <= 512; I ++ ){
Chptr [I] ='s ';
}
Ptr2 = ptr1;
Free (ptr2 );
Free (ptr1 );
Free (chptr );
}

 

Run the following command to start yamd:

./Run-yamd/usr/src/test/Test2/Test2
Listing 5 shows the output from the sample program Test2 using yamd. Yamd tells us that in the For Loop
There is an out-of-bounds condition.

Listing 5. Use Test2 output of yamd
Running/usr/src/test/Test2/Test2
Temp output to/tmp/yamd-out.1243
*********
./Run-yamd: line 101: 1248 segmentation fault (core dumped)
Yamd version 0.32
Starting run:/usr/src/test/Test2/Test2
Executable:/usr/src/test/Test2/Test2
Virtual program size is 1380 K
...
Info: normal allocation of this block
Address 0x40025e00, size 512
...
Info: normal allocation of this block
Address 0x40028e00, size 512
...
Info: normal allocation of this block
Address 0x4002be00, size 512
Error: Crash
...
Tried to write address 0x4002c000
Seems to be part of this block:
Address 0x4002be00, size 512
...
Address in question is at offset 512 (out of bounds)
Will dump core after checking heap.
Done.

 

Memwatch and yamd are both very useful debugging tools, and their usage methods are different. For memwatch, you
You need to add the file memwatch. h and enable the two Compilation Time tags. For link statements, yamd only
-G option is required.

Electric fence
Most Linux distributions contain an electric fence package, but you can also download it. Electric fe
NCE is a malloc () debugging library written by Bruce perens. It allocates the protected
Memory. If a fencepost error exists (running beyond the end of the array), the program will generate a protection error and close it immediately.
. By combining electric fence and GDB, You can precisely track which row tries to access protected memory.
Another feature of electric fence is its ability to detect memory leaks.

2nd cases: Use strace
The strace command is a powerful tool that displays all system calls sent by user space programs. Strace
Display the called parameters and return signed values. Strace receives information from the kernel without any special
To build the kernel. It is useful to send tracing information to applications and kernel developers. In Listing 6, partition
A format error occurs. The List displays the starting part of strace, which describes how to call up and create a file system (
Mkfs. Strace determines which call causes the problem.

Listing 6. Starting part of strace on mkfs
Execve ("/sbin/mkfs. JFS", ["mkfs. JFS", "-F", "/dev/test1"], &
...
Open ("/dev/test1", o_rdwr | o_largefile) = 4
Stat64 ("/dev/test1", {st_mode = &, st_rdev = makedev (63,255),...}) = 0
IOCTL (4, 0x40041271, 0xbfffe128) =-1 einval (invalid argument)
Write (2, "mkfs. JFS: Warning-cannot SETB"..., 98mkfs. JFS: Warning-
Cannot set blocksize on block device/dev/test1: invalid argument)
= 98
Stat64 ("/dev/test1", {st_mode = &, st_rdev = makedev (63,255),...}) = 0
Open ("/dev/test1", o_rdonly | o_largefile) = 5
IOCTL (5, 0x80041272, 0xbfffe124) =-1 einval (invalid argument)
Write (2, "mkfs. JFS: CAN/'t determine device"...,... _ exit (1)
=?

 

Listing 6 shows that the mkfs program used to format partitions failed due to the call of IOCTL. IOCTL blkgetsize64 failed
. (BLKGET-SIZE64 is defined in the source code that calls IOCTL .) Blkgetsize64 IOCTL will be added to L
All devices in inux, but logical volume manager does not support it yet. Therefore, if blkgetsize64 IOC
If the TL call fails, The mkfs code will change to call the earlier IOCTL call. This makes mkfs suitable for the logical volume manager.
.

3rd cases: Use GDB and oops
You can use the gdb Program (Free Software Foundation debugger) on the command line to find out the error.
An error occurs when I use the gdb program as one of several graphical tools such as data display debugger (DDD ).
Error. You can use GDB to debug user space programs or linux kernels. This section only discusses how to run g from the command line
DB.

Use the gdb program name command to start GDB. GDB will load the executable program symbol and display the input prompt, let
You can start to use the debugger. You can use GDB to view processes in three ways:

Run the attach command to view a running process. Attach stops the process.

Run the command to run the program and debug the program from the beginning.

Check the existing core file to determine the status of the process upon termination. To view the core file, run the following command to start GDB:
.
GDB programname corefilename
To debug with the core file, you not only need the executable file and source file of the program, but also the core file itself. Yes
Use the-C option to start GDB with the core file:

GDB-C Core programname

Which line of code does GDB display, causing a core dump of the program.

Before running a program or connecting to a running program, list the source code that you think is wrong, set the breakpoint, and
Start program debugging. You can use the help command to view the comprehensive GDB online help and detailed tutorials.

Kgdb
The kgdb Program (using the Linux kernel debugger of the remote host of GDB) provides a way to debug
Core Mechanism. The kgdb program is a kernel extension that enables you to connect to
GDB extended Kernel Machine. You can go deep into the kernel, set breakpoints, check data, and perform other operations (Class
Similar to how you use GDB on an application ). One of the main features of this patch is that the host running GDB runs on
Connect to the target machine during the boot process (run the kernel to be debugged ). This allows you to start debugging as soon as possible. Please note that,
The patch adds functions to the Linux kernel, so GDB can be used to debug the Linux kernel.

Kgdb requires two machines: one is the development machine and the other is the test machine. One serial line (air conditioning and Demodulation
Device cables) will be connected through the serial port of the machine. The kernel you want to debug runs on the testing machine; GDB runs on the development machine.
. GDB uses a serial line to communicate with the kernel you want to debug.

Follow the steps below to set the kgdb debugging environment:

Download the patches applicable to your Linux kernel version.

Build components to the kernel, because this is the easiest way to use kgdb. (Note that there are two ways to build multiple
Number of kernel components, such as modules or directly built into the kernel. For example
D file system, JFS) can be built as a module or directly built into the kernel. By using the gdb patch, we
JFS can be directly built into the kernel .)

Apply the kernel patch and re-build the kernel.

Create a file named. gdbinit and save it in the subdirectory of the kernel source file (in other words,/usr/
Src/Linux ). The file. gdbinit contains the following four lines of code:
Set remotebaud 115200
Symbol-file vmlinux
Target remote/dev/ttys0
Set output-radix 16

Add the append = GDB line to Lilo. Lilo is used to select which kernel to use during kernel boot.
Program.
Image =/boot/bzImage-2.4.17
Label = gdb2417
Read-Only
Root =/dev/sda8
Append = "GDB gdbttys = 1 GDB-baud = 115200 nmi_watchdog = 0"

Listing 7 is a script example that introduces the kernel and modules built on the development machine to the testing machine. You need to modify
The following items:

Best @ SFB: User ID and machine name.
/Usr/src/linux-2.4.17: directory of the kernel source code tree.
BzImage-2.4.17: name of the boot Kernel on the test machine.
RCP and rsync: it must be allowed to run on the machine that builds the kernel.

Listing 7. Introduce the kernel and module scripts of the Testing Machine
Set-x
RCP best @ SFB:/usr/src/linux-2.4.17/ARCH/i386/boot/bzimage/boot/bzImage-2.4.17
RCP best @ SFB:/usr/src/linux-2.4.17/system. MAP/boot/system. Map-2.4.17
Rm-RF/lib/modules/2.4.17
Rsync-a best @ SFB:/lib/modules/2.4.17/lib/modules
Chown-r root/lib/modules/2.4.17
Lilo

 

Now we can start the gdb program on the development machine by using the directory starting from the kernel source code tree. In
In the example, the kernel source code tree is located at/usr/src/linux-2.4.17. Enter GDB to start the program.

If everything works properly, the testing machine stops during startup. Enter the gdb command cont to continue the startup process. One
A common problem is that the air-conditioning modem cable may be connected to the wrong serial port. If GDB is not started, change the port
Is the second serial port, which causes GDB to start.

Use kgdb to debug the kernel
Listing 8 lists the modified Code in the source code of the jfs_mount.c file. We create an empty
Pointer exception, so that the Code produces a segment error in line 109th.

Listing 8. Modified jfs_mount.c code
Int jfs_mount (struct super_block * SB)
{
...
Int PTR;/* Line 1 added */
Jfyi (1, ("/nmount JFS/N "));
/*
* Read/validate superblock
* (Initialize Mount inode from the superblock)
*/
If (rc = chksuper (SB ))){
Goto errout20;
}
108 PTR = 0;/* Line 2 added */
109 printk ("% d/N", * PTR);/* Line 3 added */

 

Listing 9 shows a GDB exception after the mount command is sent to the file system. Kgdb provides several commands, such
Displays data structures and variable values, and displays the status of all tasks in the system, where they reside, and where
CPU usage and so on. Listing 9 displays the information that tracing provides for the problem. The where command is used
Executes the anti-Trace, which tells the executed call where the code is stopped.

Listing 9. GDB exception and reverse tracking
Mount-t jfs/dev/sdb/JFS

Program received signal SIGSEGV, segmentation fault.
Jfs_mount (SB = 0xf78a3800) at jfs_mount.c: 109
109 printk ("% d/N", * PTR );
(GDB) Where
#0 jfs_mount (SB = 0xf78a3800) at jfs_mount.c: 109
#1 0xc01a0dbb in jfs_read_super... at super. C: 280
#2 0xc0149ff5 in get_sb_bdev... at super. C: 620
#3 0xc014a89f in do_kern_mount... at super. C: 849
#4 0xc0160e66 in do_add_mount... at namespace. C: 569
#5 0xc01610f4 in do_mount... at namespace. C: 683
#6 0xc01611ea in sys_mount... at namespace. C: 716
#7 0xc01074a7 in system_call () at af_packet.c: 1891
#8 0x0 in ?? ()
(GDB)

 

In the next section, we will discuss the same JFS segment errors, but do not set the debugger.
Run the code in listing 8 in the environment, then it uses the oops message that may be generated by the kernel.

Oops Analysis
Oops (also called panic) messages contain details of system errors, such as the content of CPU registers. In Linux
The traditional method for debugging system crashes is to analyze oops messages sent to the system console when a crash occurs. Once you have
After grasping the details, you can send the message to the ksymoops utility, which will try to convert the code into instructions and stack
Value ing to kernel symbols. In many cases, this information is enough for you to determine the possible cause of the error. Please note
The oops message does not include the core file.

Let's assume that the system has just created an oops message. As the person who writes the code, you want to solve the problem and determine what
Or you want to provide your
Most of the information about the problem, so as to solve the problem in a timely manner. Oops messages are part of the equation, but if you do not pass ksym
Oops program running does not help. The following figure shows the process of formatting oops messages.

Format oops messages

Ksymoops requires several items: Oops message output, the system. map file from the running kernel, and
/Proc/ksyms, vmlinux, And/proc/modules. About how to use ksymoops, kernel source code/usr/s
RC/Linux/documentation/oops-tracing.txt or complete instructions on the ksymoops manual page for Reference
Exam. Ksymoops disassemble the code part, point out the wrong command, and display a trace part to show how the code is
Call.

First, save the oops message in a file to run it through the ksymoops utility. List 10
The oops message created by the mount command for installing the JFS file system. The problem is that it is added to JFS by listing 8.
These three lines of code are generated.

Listing 10. Oops messages processed by ksymoops
Ksymoops 2.4.0 on i686 2.4.17. Options used
.... 15:59:37 sfb1 kernel: unable to handle kernel Null Pointer Dereference
Virtual Address 0000000
... 15:59:37 sfb1 kernel: c01588fc
... 15:59:37 sfb1 kernel: * de = 0000000
... 15:59:37 sfb1 kernel: Oops: 0000
... 15:59:37 sfb1 kernel: CPU: 0
... 15:59:37 sfb1 kernel: EIP: 0010: [jfs_mount + 60/704]

... 15:59:37 sfb1 kernel: Call trace: [jfs_read_super + 287/688]
[Get_sb_bdev + 563/736] [do_kern_mount + 189/336] [do_add_mount + 35/208]
[Do_page_fault + 0/1264]
... 15:59:37 sfb1 kernel: Call trace: []...
... 15:59:37 sfb1 kernel :[
... 15:59:37 sfb1 kernel: Code: 8B 2D 00 00 00 00 55...

> EIP; c01588fc <====
...
Trace; c0106cf3
Code; c01588fc
00000000 <_ EIP>:
Code; c01588fc <====
0: 8B 2D 00 00 00 00 mov 0x0, % EBP <====
Code; c0158902
6: 55 push % EBP

 

Next, you need to determine which line of code in jfs_mount has caused this problem. Oops messages tell us the problem is
It is caused by the instruction at the offset 3C address. One way to do this is to use objd for the jfs_mount.o file.
UMP utility, and then view the offset address 3C. Objdump is used to disassemble module functions and check your C source code.
What assembly commands will be generated. Listing 11 shows what you will see after using objdump.
Jfs_mount C code, you can see that the null value is caused by 109th rows. The reason why the offset 3C is very important is
Because the oops message identifies this as the location that causes the problem.

Listing 11. jfs_mount assembler list
109 printk ("% d/N", * PTR );

Objdump jfs_mount.o

Jfs_mount.o: File Format elf32-i386

Disassembly of section. Text:

00000000:
0: 55 push % EBP
...
2c: E8 CF 03 00 00 call 400
31: 89 C3 mov % eax, % EBX
33: 58 pop % eax
34: 85 dB test % EBX, % EBX
36: 0f 85 55 0200 00 JNE 291
3c: 8B 2D 00 00 00 00 mov 0x0, % EBP <problem line above
42: 55 push % EBP

 

KDB
The Linux kernel debugger (KDB) is a patch for the Linux kernel. It provides
Check the kernel memory and data structure when the system can run. Note that KDB does not need two machines,
It also does not allow you to perform source code-level debugging like kgdb. You can add additional commands to give the data
Structure identifier or address, these commands can format and display the basic system data structure. The current command set allows
You control kernel operations including the following:

Single-step processor execution
Stops executing a specific command.
Stops when a specified virtual memory location is accessed (or modified ).
Stops when registers in the input/output address space are accessed.
Trace the current active task and all other tasks using the process ID)
Disassembly of commands

Chase memory overflow

You certainly don't want to fall into a situation like allocation overflow after thousands of calls.

Our team spent a lot of time tracking odd memory errors. Applications in our development work
The application can run on the site, but on the new product workstation, this application cannot run after 2 million calls of malloc ().
Run. The real problem is that it overflows after about 1 million calls. All of the new systems have this problem,
Because the la s of the reserved malloc () region are different, these scattered memories are placed in different places.
In case of overflow, different contents are damaged.

We use a variety of different technologies to solve this problem, one is to use the debugger, the other is to add
Trace function. At this point in my career, I began to focus on memory debugging tools, hoping to be faster and more
Effectively solve these types of problems. One of the first tasks I did when I started a new project was to run memwatch.
And yamd to see if they will point out problems in memory management.

Memory leakage is a common problem in applications, but you can use the tools described in this article to solve these problems.

4th cases: Tracing using magic key sequence
If your keyboard can still be used when Linux is suspended, use the following methods to solve the root cause of the suspension problem.
By following these steps, you can view the currently running processes and trace back of all processes that use the magic key sequence.
 

The kernel you are running must be built when CONFIG_MAGIC_SYS-REQ is enabled. You must also
This mode. Cltr + ALT + F1 will bring you into the text mode, cltr + ALT + F7 will bring you back to X Windows.
In text mode, press. The above magic keys will show the currently running processes and all processes respectively.
Stack trace.
Find/var/log/messages. If everything is correctly set, the system should have converted the symbolic location of the kernel for you.
Address. The tracing will be written to the/var/log/messages file.

Conclusion
Many different tools are available to help Debug Programs on Linux. The tools described in this article can help you solve many problems.
Encoding Problems. A tool that can display memory leaks, overflow, and other locations can solve memory management problems. I found that memwa
Tch and yamd are very helpful.

The Linux kernel patch enables GDB to work on the Linux kernel.
File System issues are very helpful. In addition, the tracking utility can help identify the file system during system calls
The utility fails somewhere. Next time you try to flat errors in Linux, try one of these tools.
.

References

Download memwatch.

Download yamd.

Download electricfence.

View the debugging function program of dynamic probes.

Read the article "Linux software debugging with GDB ". (Developerworks, February 2001)
 

Visit IBM Linux technology center.

Find more Linux articles in the developerworks Linux area.

About the author
Steve best works at the IBM Linux technology center in Austin, Texas. Currently
Perform the work of The journaled File System (JFS) for Linux projects. Steve is operating
He has rich experience in systems, focusing on file systems, internationalization and security.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.