Find and solve program errors in Linux

Last Update:2018-12-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Source: http://www.ddvip.net/ OS /linux/index6/56.htm

Steve best (sbest@us.ibm.com)
JFS core team member, IBM

You can monitor the running user space program in various ways: You can run the debugger for it, debug the program in one step, add print statements, or add tools to analyze the program. This article describes several methods that can be used to debug programs running on Linux. We will review the four debugging problems, including segment errors, memory overflow and leakage, and suspension.
This article discusses four types of Linux program debugging. In the 1st cases, we used two sample programs with memory allocation problems and used memwatch and yet another malloc debugger (yamd) tools to debug them. In the 2nd cases, we used the strace utility in Linux, which can track system calls and signals to find out where the program is wrong. In the 3rd cases, we use the oops function of the Linux kernel to solve segment errors, and show you how to set the kernel source code-level debugger (kgdb ), use the GNU Debugger (GNU Debugger, GDB) to solve the same problem. The kgdb program is a Linux kernel that uses a serial connection. In 4th cases, we use the magic key sequence provided on Linux to display information about components that cause suspension issues.

Common debugging methods
When your program contains errors, it is very likely that there is a condition somewhere in the code. You think it is true, but it is actually false ). The process of identifying an error is a process in which a certain condition is always believed to be true after an error is found.

The following examples are some types of conditions that you may be certain about:

Somewhere in the source code, a variable has a specific value.
A structure has been correctly set in a given place.
For the given if-then-else statement, the IF part is the path to be executed.
When a subroutine is called, it receives its parameters correctly.

Identify the error, that is, determine whether all the above conditions exist. If you are sure that a variable should have a specific value when the subroutine is called, check whether this is the case. If you believe that the if structure will be executed, check whether this is the case. Generally, your assumptions are correct, but eventually you will find a situation that does not match the assumptions. Result, you will find out where an error occurs.

Debugging is a task that you cannot escape. There are many debugging methods, such as printing messages to the screen, using the debugger, or simply considering program execution and carefully figuring out the problem.

Before you fix the problem, you must find its source. For example, for a segment error, you need to know which line of the Code contains the segment error. Once you find an error line in the code, determine the value of the variable in the method, the method of calling the method, and the details about how the error occurs. Using the debugger makes it easy to find all this information. If no debugger is available, you can use other tools. (Please note that the debugger may not be provided in the product environment, and the Linux kernel does not have a built-in debugger .)

Practical memory and kernel tools
You can use the debugging tool on Linux to track user space and kernel problems in various ways. Use the following tools and technologies to build and debug your source code:
User space tools:

Memory tools: memwatch and yamd
Strace
GNU Debugger (GDB)
Magic key sequence

Kernel tools:

Kernel source code-level debugger (kgdb)
Built-in kernel debugger (KDB)
Oops

This article will discuss a type of problem that is hard to find by manually checking the code, and this type of problem only exists in rare cases. Memory Errors usually occur in multiple cases at the same time, and sometimes you can only find Memory Errors After deploying the program.

1st cases: Memory debugging tools
As a standard programming language in Linux, C language gives us great control over dynamic memory allocation. However, such freedom may cause serious memory management problems that may cause program crashes or performance degradation over time.

Memory leakage (that is, there is a corresponding free () call in malloc () that will never be released after execution) and Buffer Overflow (for example, write operations on the memory previously allocated to an array) are some common problems that may be difficult to detect. This section will discuss several debugging tools that greatly simplify the process of detecting and identifying memory problems.

Memwatch
Memwatch, written by Johan Lindh, is an open-source C-language memory error detection tool. You can download it by yourself (see references later in this article ). After adding a header file in the Code and defining memwatch in the GCC statement, you can track memory leaks and errors in the program. Memwatch supports ansi c and provides results log records. It can detect double-free, erroneous free, and unfreed memory), overflow, underflow, and so on.

Listing 1. Memory sample (test1.c)

[Copy to clipboard] [-]

Code:

# Include
# Include
# Include "memwatch. H"

Int main (void)
{
Char * ptr1;
Char * ptr2;

Ptr1. = malloc (512 );
Ptr2 = malloc (512 );

Ptr2 = ptr1;
Free (ptr2 );
Free (ptr1 );
}

The code in Listing 1 allocates two 512-byte memory blocks, and the pointer to the first memory block is set to point to the second memory block. As a result, the address of the second memory block is lost, resulting in Memory leakage.

Now let's compile memwatch. c In listing 1. The following is a makefile example:

Test1
Gcc-dmemwatch-dmw_stdio test1.c memwatch
C-o test1

When you run the test1 program, it generates a report about the leaked memory. Listing 2 shows the sample memwatch. log output file.

Listing 2. test1 memwatch. Log File
Memwatch 2.67 copyright (c) 1992-1999 Johan Lindh

...
Double-free: <4>; test1.c (15), 0x80517b4 was freed from test1.c (14)
...
Unfreed: <2>; test1.c (11), 512 bytes at 0x80519e4
{Fe ..............}

Memory usage statistics (global ):
N) umber of allocations made: 2
L) argest memory usage: 1024
T) otal of all alloc () cballs: 1024
U) nfreed bytes totals: 512

Memwatch shows you the rows that actually cause problems. If you release a released pointer, it will tell you. The same applies to memory that has not been released. The end of the log shows statistics, including the memory leaked, memory used, and total memory allocated.

Yamd
The yamd software package is written by Nate Eldredge to find dynamic and memory-related problems in C and C ++. At the time of writing this article, yamd's latest version is 0.32. Download the yamd-0.32.tar.gz (see references ). Run the make command to build the program. Then run the make install command to install the program and set the tool.

Once you download yamd, use it on test1.c. Delete # include memwatch. h and make the following minor changes to makefile:

Use test1 of yamd
Gcc-G test1.c-O test1

Listing 3 shows the output from yamd on test1.

Listing 3. Using test1 output of yamd
Yamd version 0.32
Executable:/usr/src/test/yamd-0.32/test1
...
Info: normal allocation of this block
Address 0x40025e00, size 512
...
Info: normal allocation of this block
Address 0x40028e00, size 512
...
Info: normal deallocation of this block
Address 0x40025e00, size 512
...
Error: Multiple freeing
Free of pointer already freed
Address 0x40025e00, size 512
...
Warning: Memory Leak
Address 0x40028e00, size 512
Warning: total memory leaks:
1 unfreed allocations totaling 512 bytes

* ** Finished at Tue... 10:07:15 2002
Allocated a grand total of 1024 bytes 2 allocations
Average of 512 bytes per allocation
Maxbytes allocated at one time: 1024
24 k alloced internally/12 K mapped now/8 K max
Virtual program size is 1416 K
End.

Yamd shows that we have released the memory and Memory leakage exists. Let's try yamd on another sample program in Listing 4.

Listing 4. Memory code (test2.c)

[Copy to clipboard] [-]

Code:

# Include
# Include

Int main (void)
{
Char * ptr1;
Char * ptr2;
Char * chptr;
Int I = 1;
Ptr1. = malloc (512 );
Ptr2 = malloc (512 );
Chptr = (char *) malloc (512 );
For (I; I <= 512; I ++ ){
Chptr [I] ='s ';
}
Ptr2 = ptr1;
Free (ptr2 );
Free (ptr1 );
Free (chptr );
}

Run the following command to start yamd:

./Run-yamd/usr/src/test/Test2/Test2
Listing 5 shows the output from the sample program Test2 using yamd. Yamd tells us that there is "out-of-bounds" in the for loop.

Listing 5. Use Test2 output of yamd
Running/usr/src/test/Test2/Test2
Temp output to/tmp/yamd-out.1243
*********
./Run-yamd: line 101: 1248 segmentation fault (core dumped)
Yamd version 0.32
Starting run:/usr/src/test/Test2/Test2
Executable:/usr/src/test/Test2/Test2
Virtual program size is 1380 K
...
Info: normal allocation of this block
Address 0x40025e00, size 512
...
Info: normal allocation of this block
Address 0x40028e00, size 512
...
Info: normal allocation of this block
Address 0x4002be00, size 512
Error: Crash
...
Tried to write address 0x4002c000
Seems to be part of this block:
Address 0x4002be00, size 512
...
Address in question is at offset 512 (out of bounds)
Will dump core after checking heap.
Done.

Memwatch and yamd are both very useful debugging tools, and their usage methods are different. For memwatch, you need to add the file memwatch. h and enable two Compilation Time tags. For link statements, yamd only needs the-G option.

Electric fence
Most Linux distributions contain an electric fence package, but you can also download it. Electric fence is a malloc () debugging library written by Bruce perens. It allocates protected memory after you allocate the memory. If a fencepost error exists (running beyond the end of the array), the program will generate a protection error and end immediately. By combining electric fence and GDB, You can precisely track which row tries to access protected memory. Another feature of electric fence is its ability to detect memory leaks.

2nd cases: Use strace
The strace command is a powerful tool that displays all system calls sent by user space programs. Strace displays the parameters of these calls and Returns signed values. Strace receives information from the kernel, and does not need to build the kernel in any special way. It is useful to send tracing information to applications and kernel developers. In Listing 6, a partition format is incorrect. The List displays the starting part of strace about calling up and creating a file system (mkfs. Strace determines which call causes the problem.

Listing 6. Starting part of strace on mkfs
Execve ("/sbin/mkfs. JFS", ["mkfs. JFS", "-F", "/dev/test1"], &
...
Open ("/dev/test1", o_rdwr | o_largefile) = 4
Stat64 ("/dev/test1", {st_mode = &, st_rdev = makedev (63,255),...}) = 0
IOCTL (4, 0x40041271, 0xbfffe12 =-1 einval (invalid argument)
Write (2, "mkfs. JFS: Warning-cannot SETB"..., 98mkfs. JFS: Warning-
Cannot set blocksize on block device/dev/test1: invalid argument)
= 98
Stat64 ("/dev/test1", {st_mode = &, st_rdev = makedev (63,255),...}) = 0
Open ("/dev/test1", o_rdonly | o_largefile) = 5
IOCTL (5, 0x80041272, 0xbfffe124) =-1 einval (invalid argument)
Write (2, "mkfs. JFS: CAN/'t determine device"...,... _ exit (1)
=?

Listing 6 shows that the mkfs program used to format partitions failed due to the call of IOCTL. IOCTL blkgetsize64 failed. (BLKGET-SIZE64 is defined in the source code that calls IOCTL .) Blkgetsize64 IOCTL will be added to all devices in Linux, which is not supported by logical volume manager. Therefore, if the blkgetsize64 IOCTL call fails, The mkfs code will call the earlier IOCTL call, which makes mkfs suitable for the logical volume manager.

3rd cases: Use GDB and oops
You can use the gdb Program (Free Software Foundation debugger) on the command line to identify errors, or use data display debugger (DDD) one of these graphic tools uses the gdb program to identify errors. You can use GDB to debug user space programs or linux kernels. This section only discusses how to run GDB from the command line.

Use the gdb program name command to start GDB. GDB loads the executable program symbol and displays the input prompt, allowing you to start using the debugger. You can use GDB to view processes in three ways:

Run the attach command to view a running process. Attach stops the process.

Run the command to run the program and debug the program from the beginning.

Check the existing core file to determine the status of the process upon termination. To view the core file, run the following command to start GDB.
GDB programname corefilename
To debug with the core file, you not only need the executable file and source file of the program, but also the core file itself. Start GDB with the core file. Use the-C option:

GDB-C Core programname

Which line of code does GDB display, causing a core dump of the program.

Before running a program or connecting to a running program, list the source code that you think is wrong, set the breakpoint, and start debugging the program. You can use the help command to view the comprehensive GDB online help and detailed tutorials.

Kgdb
The kgdb Program (using the Linux kernel debugger of the remote host of GDB) provides a mechanism to debug the Linux kernel using GDB. The kgdb program is a kernel extension that enables you to connect to the kernel machine running the kgdb extension on a remote host. You can go deep into the kernel, set breakpoints, check data, and perform other operations (similar to using GDB on applications ). One of the main features of this patch is that the host running GDB connects to the target machine (the kernel to be debugged) during the boot process ). This allows you to start debugging as soon as possible. Note that the patch adds functions to the Linux kernel, so GDB can be used to debug the Linux kernel.

Kgdb requires two machines: one is the development machine and the other is the test machine. A Serial Line (air-conditioning modem cable) will be connected through the serial port of the machine. The kernel you want to debug runs on the test machine, while GDB runs on the development machine. GDB uses a serial line to communicate with the kernel you want to debug.

Follow the steps below to set the kgdb debugging environment:

Download the patches applicable to your Linux kernel version.

Build components to the kernel, because this is the easiest way to use kgdb. (Note that there are two ways to build most kernel components, for example, as a module or directly built into the kernel. For example, the journaled File System (JFS) can be built as a module or directly built into the kernel. By using the gdb patch, we can build JFS directly into the kernel .)

Apply the kernel patch and re-build the kernel.

Create a file named. gdbinit and save it in the subdirectory of the kernel source file (in other words,/usr/src/Linux ). The file. gdbinit contains the following four lines of code:

[Copy to clipboard] [-]

Code:

Set remotebaud 115200
Symbol-file vmlinux
Target remote/dev/ttys0
Set output-radix 16

Add the append = GDB line to Lilo, which is used to select the kernel boot loader to use when guiding the kernel.
Image =/boot/bzImage-2.4.17
Label = gdb2417
Read-Only
Root =/dev/sda8
Append = "GDB gdbttys = 1 GDB-baud = 115200 nmi_watchdog = 0"

Listing 7 is a script example that introduces the kernel and modules built on the development machine to the testing machine. You need to modify the following items:

Best @ SFB: User ID and machine name.
/Usr/src/linux-2.4.17: directory of the kernel source code tree.
BzImage-2.4.17: name of the boot Kernel on the test machine.
RCP and rsync: it must be allowed to run on the machine that builds the kernel.

Listing 7. Introduce the kernel and module scripts of the Testing Machine
Set-x
RCP best @ SFB:/usr/src/linux-2.4.17/ARCH/i386/boot/bzimage/boot/bzImage-2.4.17
RCP best @ SFB:/usr/src/linux-2.4.17/system. MAP/boot/system. Map-2.4.17
Rm-RF/lib/modules/2.4.17
Rsync-a best @ SFB:/lib/modules/2.4.17/lib/modules
Chown-r root/lib/modules/2.4.17
Lilo

Now we can start the gdb program on the development machine by using the directory starting from the kernel source code tree. In this example, the kernel source code tree is located in the/usr/src/linux-2.4.17. Enter GDB to start the program.

If everything works properly, the testing machine stops during startup. Enter the gdb command cont to continue the startup process. A common problem is that the air-conditioning modem cable may be connected to the wrong serial port. If GDB is not started, change the port to the second serial port, which causes GDB to start.

Use kgdb to debug the kernel
Listing 8 lists the modified Code in the source code of the jfs_mount.c file. We create a null pointer exception in the Code so that the Code produces a segment error in line 109th.

Listing 8. Modified jfs_mount.c code

[Copy to clipboard] [-]

Code:

Int jfs_mount (struct super_block * SB)
{
...
Int PTR;/* Line 1 added */
Jfyi (1, ("/nmount JFS/N "));
/*
* Read/validate superblock
* (Initialize Mount inode from the superblock)
*/
If (rc = chksuper (SB ))){
Goto errout20;
}
108 PTR = 0;/* Line 2 added */
109 printk ("% d/N", * PTR);/* Line 3 added */

Checklist 9 displays a GDB exception after the mount command is sent to the file system. Kgdb provides several commands, such as displaying data structures and variable values, displaying the status of all tasks in the system, where they reside, and where they use CPUs. Listing 9 displays the information provided by the tracing trace for this problem. The where command is used to execute the anti-tracing command, which tells the executed call where it is stopped in the code.

Listing 9. GDB exception and reverse tracking
Mount-t jfs/dev/sdb/JFS

Program received signal SIGSEGV, segmentation fault.
Jfs_mount (SB = 0xf78a3800) at jfs_mount.c: 109
109 printk ("% d/N", * PTR );
(GDB) Where
#0 jfs_mount (SB = 0xf78a3800) at jfs_mount.c: 109
#1 0xc01a0dbb in jfs_read_super... at super. C: 280
#2 0xc0149ff5 in get_sb_bdev... at super. C: 620
#3 0xc014a89f in do_kern_mount... at super. C: 849
#4 0xc0160e66 in do_add_mount... at namespace. C: 569
#5 0xc01610f4 in do_mount... at namespace. C: 683
#6 0xc01611ea in sys_mount... at namespace. C: 716
#7 0xc01074a7 in system_call () at af_packet.c: 1891
#8 0x0 in ?? ()
(GDB)

In the next section, we will discuss the same JFS segment error, but do not set the debugger. If you execute the code in listing 8 in a non-kgdb kernel environment, then it uses oops messages that may be generated by the kernel.

Oops Analysis
Oops (also called panic) messages contain details of system errors, such as the content of CPU registers. In Linux, the traditional method for debugging system crashes is to analyze oops messages sent to the system console when a crash occurs. Once you have mastered the details, you can send the message to the ksymoops utility, which will try to convert the code into instructions and map the stack value to the kernel symbol. In many cases, this information is enough for you to determine the possible cause of the error. Note that oops messages do not include core files.

Let's assume that the system has just created an oops message. As the code writer, you want to solve the problem and determine what causes the generation of OOPS messages, alternatively, you want to provide developers with the code that displays oops messages with most of the information about your problem, so as to solve the problem in a timely manner. Oops messages are part of the equation, but it does not matter if you do not run the program through ksymoops. The following figure shows the process of formatting oops messages.

Format oops messages

Ksymoops requires several items: Oops message output, system. map file from the running kernel,/proc/ksyms, vmlinux, And/proc/modules. For more information about how to use ksymoops, see the complete instructions on kernel source code/usr/src/Linux/documentation/oops-tracing.txt or on the ksymoops manual page. Ksymoops disassemble the Code Section to identify the wrong instruction and display a trace section to show how the code is called.

First, save the oops message in a file to run it through the ksymoops utility. Listing 10 shows the oops message created by the mount command for installing the JFS file system. The problem is that the three lines of code added to the JFS installation code in listing 8 are generated.

Listing 10. Oops messages processed by ksymoops
Ksymoops 2.4.0 on i686 2.4.17. Options used
.... 15:59:37 sfb1 kernel: unable to handle kernel Null Pointer Dereference
Virtual Address 0000000
... 15:59:37 sfb1 kernel: c01588fc
... 15:59:37 sfb1 kernel: * de = 0000000
... 15:59:37 sfb1 kernel: Oops: 0000
... 15:59:37 sfb1 kernel: CPU: 0
... 15:59:37 sfb1 kernel: EIP: 0010: [jfs_mount + 60/704]

... 15:59:37 sfb1 kernel: Call trace: [jfs_read_super + 287/688]
[Get_sb_bdev + 563/736] [do_kern_mount + 189/336] [do_add_mount + 35/208]
[Do_page_fault + 0/1264]
... 15:59:37 sfb1 kernel: Call trace: []...
... 15:59:37 sfb1 kernel :[
... 15:59:37 sfb1 kernel: Code: 8B 2D 00 00 00 00 55...

>;>; EIP; c01588fc <===
...
Trace; c0106cf3
Code; c01588fc
00000000 <_ EIP> ;:
Code; c01588fc <====
0: 8B 2D 00 00 00 00 mov 0x0, % EBP <====
Code; c0158902
6: 55 push % EBP

Next, you need to determine which line of code in jfs_mount has caused this problem. The oops message tells us that the problem is caused by the instruction at the offset 3C address. One way to do this is to use the objdump utility for the jfs_mount.o file and view the offset 3C address. Objdump is used to disassemble module functions to see what Assembly commands will be generated in your C source code. Listing 11 shows what you will see after you use objdump. Next, let's check the C code of jfs_mount. We can see that the null value is caused by 109th rows. The reason why the offset address 3C is very important is that the oops message identifies this as the location that caused the problem.

Listing 11. jfs_mount assembler list

[Copy to clipboard] [-]

Code:

109 printk ("% d/N", * PTR );

Objdump jfs_mount.o

Jfs_mount.o: File Format elf32-i386

Disassembly of section. Text:

00000000:
0: 55 push % EBP
...
2c: E8 CF 03 00 00 call 400
31: 89 C3 mov % eax, % EBX
33: 58 pop % eax
34: 85 dB test % EBX, % EBX
36: 0f 85 55 0200 00 JNE 291
3c: 8B 2D 00 00 00 00 mov 0x0, % EBP <problem line above
42: 55 push % EBP

KDB
The Linux kernel debugger (KDB) is a patch for the Linux kernel. It provides a way to check the kernel memory and data structure when the system can run. Note that KDB does not need two machines, but it does not allow you to debug the source code like kgdb. You can add additional commands to identify or address the data structure. These commands can format and display the basic system data structure. The current command set allows you to control kernel operations including the following:

Single-step processor execution
Stops executing a specific command.
Stops when a specified virtual memory location is accessed (or modified ).
Stops when registers in the input/output address space are accessed.
Trace the current active task and all other tasks using the process ID)
Disassembly of commands

Chase memory overflow

You certainly don't want to fall into a situation like allocation overflow after thousands of calls.

Our team spent a lot of time tracking odd memory errors. The application can run on our development workstation, but on the new product workstation, this application cannot run after 2 million calls of malloc. The real problem is that it overflows after about 1 million calls. This problem exists in all new systems because the la s of the reserved malloc () areas are different, so that these scattered memories are placed in different places, different contents are damaged in the event of overflow.

We use a variety of different technologies to solve this problem. One of them is to use the debugger, and the other is to add the tracing function in the source code. In my career, I started to focus on memory debugging tools, hoping to solve these types of problems faster and more effectively. When I started a new project, one of the first tasks I did was to run memwatch and yamd to see if they would point out problems in memory management.

Memory leakage is a common problem in applications, but you can use the tools described in this article to solve these problems.

4th cases: Tracing using magic key sequence
If your keyboard can still be used when Linux is suspended, use the following methods to solve the root cause of the suspension problem. By following these steps, you can view the currently running processes and trace back of all processes that use the magic key sequence.

The kernel you are running must be built when CONFIG_MAGIC_SYS-REQ is enabled. You must also be in text mode. Cltr + ALT + F1 will bring you into the text mode, cltr + ALT + F7 will bring you back to X Windows.
In text mode, press. The above magic keys give the stack trace of the currently running process and all processes respectively.
Find/var/log/messages. If everything is correctly set, the system should have converted the symbolic address of the kernel for you. The tracing will be written to the/var/log/messages file.

Conclusion
Many different tools are available to help Debug Programs on Linux. The tool described in this article can help you solve many coding problems. Tools that can display memory leaks, overflow, and other locations can solve memory management problems. I found memwatch and yamd are very helpful.

Using the Linux kernel patch will enable GDB to work on the Linux kernel, which is helpful for solving the Linux File System Problems I use in my work. In addition, the tracking utility can help determine where the file system utility fails during system calls. Next time, when you want to fl errors in Linux, try one of these tools.

References

Download memwatch.

Download yamd.

Download electricfence.

View the debugging function program of dynamic probes.

Read the article "Linux software debugging with GDB ". (Developerworks, February 2001)

Visit IBM Linux technology center.

Find more Linux articles in the developerworks Linux area.

About the author
Steve best works at the IBM Linux technology center in Austin, Texas. He is currently working on the journaled File System (JFS) for Linux projects. Steve has rich experience in operating systems and focuses on file systems, internationalization and security.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Find and solve program errors in Linux

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support