Transferred from: https://www.ibm.com/developerworks/cn/linux/sdk/l-debug/index.html
You can monitor a running user-space program in a variety of ways: You can run the debugger for it, step through the program, add a print statement, or add tools to parse the program. This article describes several methods that you can use to debug programs that run on Linux. We'll review four debugging problems, including segment errors, memory overflows and leaks, and hangs.
2 reviews
Steve Best ([email protected]) JFS Caucus member, IBM
August 09, 2002
Develop and deploy your next application on the IBM Bluemix cloud platform.
Get started with your trial
This article discusses four ways to debug Linux programs. In the 1th case, we used two sample programs with memory allocation problems and debugged them using the Memwatch and yet another Malloc Debugger (YAMD) tools. In the 2nd case, we used the Strace utility in Linux, which tracks system calls and signals to find out where the program has gone wrong. In the 3rd case, we use the Oops feature of the Linux kernel to solve the program's segment errors and show you how to set the kernel source level debugger (kernel source levels debugger,kgdb) to use the GNU debugger (GNU Debugger,g db) to solve the same problem; the KGDB program is a remote GDB using a serial connected Linux kernel. In the 4th case, we use the magic keying sequence (magic key sequence) provided on Linux to display information about the component that raised the pending issue.
Common debugging methods
When you include an error in your program, it is likely that there is a condition somewhere in your code that you think is true (true), but is actually false (false). The process of locating the error is to overturn a condition process that was previously believed to be true after finding the error.
The following examples are some of the types of conditions that you may be confident of establishing:
- Somewhere in the source code, a variable has a specific value.
- At a given place, a structure has been set up correctly.
- For a given
if-then-else
statement, the if
part is the path to be executed.
- When a subroutine is called, the routine receives its arguments correctly.
Finding an error is to determine if all of the above conditions exist. If you are sure that a variable should have a specific value when the subroutine is called, check to see if that is the case. If you believe that if
the structure will be executed, then check if this is the case. Usually, your assumptions will be correct, but eventually you will find things that don't match your assumptions. As a result, you will find out where the error occurred.
Debugging is a task you can't escape. There are a number of ways to debug, such as printing messages to the screen, using the debugger, or just thinking about what the program is doing and carefully trying to figure out where the problem is.
Before you fix a problem, you must find out its source. For example, for a segment error, you need to know which line of the code the segment error occurred in. Once you find the line in the code that is faulted, determine the value of the variable in the method, how the method is called, and details about how the error occurred. Using the debugger will make it easy to find all this information. If no debugger is available, you can also use other tools. (Note that the debugger may not be available in the product environment, and the Linux kernel does not have a built-in debugger.) )
Useful memory and kernel tools
You can use debugging Tools on Linux to track user space and kernel issues in a variety of ways. Use the tools and techniques below to build and debug your source code:
User space Tools :
- Memory tools: Memwatch and YAMD
- Strace
- GNU Debugger (GDB)
- Magic keying sequence
Kernel Tools :
- Kernel Source-level debugger (KGDB)
- Built-in kernel debugger (KDB)
- Oops
This article discusses a class of problems that are not easily found by manually checking the code, and such problems exist only in rare cases. Memory errors usually occur when there are multiple scenarios, and you can sometimes only discover memory errors after you deploy the program.
Back to top of page
1th Case: Memory Debugging Tool
The C language as a standard programming language on Linux systems gives us great control over dynamic memory allocation. However, this freedom can lead to serious memory management problems that can cause a program to crash or cause performance degradation over time.
A memory leak (that is malloc()
, an internal call that is free()
never freed after execution) and a buffer overflow (for example, writes to memory previously allocated to an array) are common problems that may be difficult to detect. This section discusses several debugging tools that greatly simplify the process of detecting and locating memory problems.
Back to top of page
Memwatch
Written by Johan Lindh, Memwatch is an open source C language memory error Detection tool that you can download yourself (see Resources later in this article). Once you have added a header file in your code and defined Memwatch in the GCC statement, you can track memory leaks and errors in your program. The Memwatch supports ANSI C, which provides results logging, detection of double release (Double-free), error release (erroneous free), memory not freed (Unfreed memories), overflow and underflow, and so on.
Listing 1. Memory Samples (TEST1.C)
#include <stdlib.h> #include <stdio.h> #include "memwatch.h" int main (void) { char *ptr1; char *ptr2; PTR1 = malloc (+); PTR2 = malloc (+); PTR2 = PTR1; Free (PTR2); Free (PTR1);}
The code in Listing 1 allocates two 512-byte blocks of memory, and then a pointer to the first memory block is set to point to the second block of memory. As a result, the address of the second memory block is lost, resulting in a memory leak.
Now we compile the memwatch.c of Listing 1. The following is an example of a makefile:
Test1
Gcc-dmemwatch-dmw_stdio test1.c Memwatchc-o test1
When you run the Test1 program, it generates a report about the leaked memory. Listing 2 shows the sample Memwatch.log output file.
Listing 2. Test1 Memwatch.log File
Memwatch 2.67 Copyright (C) 1992-1999 Johan lindh...double-free: <4> test1.c (), 0x80517b4 was freed from test1.c (1 4) ... unfreed: <2> test1.c (one), bytes at 0x80519e4{fe fe fe fe fe fe fe fe ......., ...} Memory usage statistics (GLOBAL): N) Umber of Allocations Made:2 L) argest memory usage:1024 T) Otal of all AL LOC () calls:1024 U) nfreed bytes totals:512
Memwatch shows you the line that really caused the problem. If you release a pointer that has already been released, it will tell you. The same is true for memory that is not released. The end of the log displays statistics, including how much memory was leaked, how much memory was used, and how much memory was allocated in total.
Back to top of page
Yamd
The YAMD package was written by Nate Eldredge to look for dynamic memory allocation issues in C and C + +. At the time of this writing, the latest version of YAMD is 0.32. Please download yamd-0.32.tar.gz (see Resources). Execute the make
command to build the program, then execute the make install
command installer and set the tool.
Once you have downloaded YAMD, use it on the test1.c. Please delete #include memwatch.h
and make the following minor changes to the makefile:
Using YAMD's Test1
Gcc-g Test1.c-o Test1
Listing 3 shows the output from the YAMD on the test1.
Listing 3. Test1 output using the YAMD
YAMD version 0.32Executable:/usr/src/test/yamd-0.32/test1 ... Info:normal allocation of this blockaddress 0x40025e00, size 512...info:normal allocation of this blockaddress 0x40028e0 0, size 512...info:normal deallocation of this blockaddress 0x40025e00, size 512...error:multiple freeing Atfree of Poin ter already freedaddress 0x40025e00, size 512...warning:memory leakaddress 0x40028e00, size 512warning:total Memory leak S:1 unfreed allocations totaling bytes*** finished at Tue ... 10:07:15 2002Allocated A grand total of bytes 2 Allocationsaverage of the bytes per Allocationmax bytes allocated at One time:102424 k alloced internally/12 k mapped now/8 K maxvirtual program size is 1416 kend.
YAMD shows that we have freed up memory and there is a memory leak. Let's try YAMD on another sample program in Listing 4.
Listing 4. Memory Code (TEST2.C)
#include <stdlib.h> #include <stdio.h>int main (void) { char *ptr1; char *ptr2; char *chptr; int i = 1; PTR1 = malloc (+); PTR2 = malloc (+); Chptr = (char *) malloc (+); for (i; I <=; i++) { chptr[i] = ' S '; } PTR2 = PTR1; Free (PTR2); Free (PTR1); Free (chptr);}
You can use the following command to start YAMD:
./run-yamd /usr/src/test/test2/test2
Listing 5 shows the output obtained using YAMD on the sample program test2. Yamd tells us for
that there is a "cross-border (out-of-bounds)" Situation in the loop.
Listing 5. Test2 output using the YAMD
Running/usr/src/test/test2/test2temp Output To/tmp/yamd-out.1243*********./run-yamd:line 101:1248 Segmentation Fault (core dumped) YAMD version 0.32Starting run:/usr/src/test/test2/test2executable:/usr/src/test/test2/ Test2virtual program size is 1380 K ... Info:normal allocation of this blockaddress 0x40025e00, size 512...info:normal allocation of this blockaddress 0x40028e0 0, size 512...info:normal allocation of this blockaddress 0x4002be00, size 512error:crash ... Tried to write address 0x4002c000seems to being part of the block:address 0x4002be00, size 512...Address in question are at O Ffset (out of bounds) would dump core after checking heap. Done.
Memwatch and YAMD are useful debugging tools, and they are used in different ways. For Memwatch, you need to add the include file memwatch.h and open two compile time tokens. For link statements, YAMD only requires -g
options.
Back to top of page
Electric Fence
Most Linux distributions include a Electric Fence package, but you can also choose to download it. Electric Fence is a debug library written by Bruce Perens malloc()
. It allocates protected memory after you allocate memory. If there is a fencepost error (running beyond the end of the array), the program generates a protection error and ends immediately. By combining Electric Fence and GDB, you can precisely track which line is trying to access protected memory. Another feature of the Electric Fence is the ability to detect memory leaks.
Back to top of page
2nd case: Using Strace
strace
A command is a powerful tool that can display all system calls made by a user-space program. Strace Displays the parameters of these calls and returns the values in the form of symbols. Strace receives information from the kernel and does not need to build the kernel in any particular way. It is useful to send trace information to both the application and the kernel developer. In Listing 6, there is an error in the format of the partition, and the list shows the beginning of the Strace, which is about calling up the Create File system operation ( mkfs
). Strace determines which call causes the problem to occur.
Listing 6. The beginning of strace on MKFS
Execve ("/sbin/mkfs.jfs", ["MKFS.JFS", "-F", "/dev/test1"], & ... open ("/dev/test1", O_ rdwr| O_largefile) = 4 Stat64 ("/dev/test1", {st_mode=&, St_rdev=makedev (n. 255), ...}) = 0 ioctl (4, 0x40041271, 0xbfffe128) =-1 EINVAL (Invalid argument) write (2, "Mkfs.jfs:warning-cannot Setb" ..., 98mkfs.jfs:warning-cannot set blocksize On block device/dev/test1:invalid argument) = 98 Stat64 ("/dev/test1", {st_mode=&, St_rdev=makedev (63, 255), ...} ) = 0 Open ("/dev/test1", o_rdonly| O_largefile) = 5 ioctl (5, 0x80041272, 0xbfffe124) =-1 EINVAL (Invalid argument) write (2, "mkfs.jfs:can\ ' t determine Devi Ce "..... _exit (1) =?
Listing 6 shows ioctl
that the call caused the program used to format the partition to mkfs
fail. ioctl BLKGETSIZE64
failed. ( BLKGET-SIZE64
defined in the ioctl
source code of the call.) ) BLKGETSIZE64 ioctl
will be added to all of the devices in Linux, where the logical volume Manager does not yet support it. Therefore, if the BLKGETSIZE64 ioctl
call fails, the MKFS code will call an earlier ioctl
call, which makes it mkfs
applicable to the logical Volume Manager.
Back to top of page
3rd scenario: Using GDB and Oops
You can use the GDB program (the Free Software Foundation debugger) from the command line to find the error, or you can use the GDB program from one of several graphical tools such as Data Display Debugger (DDD) to find the error. You can use GDB to debug a user-space program or a Linux kernel. This section only discusses the scenario of running GDB from the command line.
Use the gdb program name
command to start GDB. GDB loads the executable symbol and displays an input prompt so you can start using the debugger. You can view the process with GDB in three ways:
- Use the attach command to start viewing a process that is already running; attach will stop the process.
- Use the Run command to execute the program and debug the program from the beginning.
- Review the existing core files to determine the status at the time the process terminates. To view the core files, start gdb with the following command.
gdb programname corefilename
To debug with a core file, you need not only the program's executable and source files, but also the core file itself. To start gdb with the core file, use the-C option:gdb -c core programname
GDB shows which line of code causes the program to have a core dump.
Before you run a program or connect to a program that is already running, list the source code that you feel is wrong, set a breakpoint, and then start debugging the program. You can use the help
commands to view comprehensive GDB online Help and detailed tutorials.
Back to top of page
Kgdb
The KGDB program (using GDB's remote host Linux kernel debugger) provides a mechanism to debug the Linux kernel using GDB. The KGDB program is an extension of the kernel that allows you to connect to a kernel machine running with KGDB extensions when running GDB on a remote host. You can then go deep into the kernel, set breakpoints, examine the data, and do something else (similar to how you use GDB on your application). One of the main features of this patch is that the host running GDB connects to the target machine during the boot process (running the kernel to be debugged). This allows you to start debugging as early as possible. Note that the patch adds functionality to the Linux kernel, so gdb can be used to debug the Linux kernel.
Using KGDB requires two machines: one is the development machine and the other is the test machine. A serial line (null modem cable) will connect them through the serial port of the machine. The kernel you want to debug runs on the test machine; gdb runs on the development machine. GDB uses a serial line to communicate with the kernel you are debugging.
Follow these steps to set up the KGDB debugging environment:
- Download the applicable patch for your Linux kernel version.
- Build the components into the kernel, as this is the simplest way to use KGDB. (Note that there are two ways to build most kernel components, such as modules or directly into the kernel.) For example, the logging file system (journaled files SYSTEM,JFS) can be built as a module or built directly into the kernel. By using the GDB patch, we can build the JFS directly into the kernel. )
- Apply kernel patches and re-build the kernel.
- Create a file named. Gdbinit and save it in the kernel source file subdirectory (in other words,/usr/src/linux). The following four lines of code are in the file. Gdbinit:
set remotebaud 115200
symbol-file vmlinux
target remote /dev/ttyS0
set output-radix 16
- Adding the Append=gdb line to Lilo,lilo is the boot loader that is used to choose which kernel to use when booting the kernel.
image=/boot/bzImage-2.4.17
label=gdb2417
read-only
root=/dev/sda8
append="gdb gdbttyS=1 gdb-baud=115200 nmi_watchdog=0"
Listing 7 is a sample script that introduces the cores and modules you build on your development machine into a test machine. You need to modify the following:
[email protected]
: User ID and machine name.
/usr/src/linux-2.4.17
: The directory of the kernel source tree.
bzImage-2.4.17
: The kernel name to be booted on the test machine.
rcp
and rsync
: it must be allowed to run on the machine that builds the kernel.
Listing 7. Script that introduces the kernel and module of the test machine
SET-XRCP [email protected]:/USR/SRC/LINUX-2.4.17/ARCH/I386/BOOT/BZIMAGE/BOOT/BZIMAGE-2.4.17RCP [email protected]:/ usr/src/linux-2.4.17/system.map/boot/system.map-2.4.17rm-rf/lib/modules/2.4.17rsync-a [Email protected]:/lib/ Modules/2.4.17/lib/moduleschown-r Root/lib/modules/2.4.17lilo
Now we can start the GDB program on the development machine by changing to a directory starting with the kernel source tree. In this example, the kernel source tree is located in/usr/src/linux-2.4.17. Enter the gdb
startup program.
If everything is OK, the test machine will stop during the startup process. Enter gdb
cont
the command to continue the startup process. A common problem is that the null modem cable may be connected to the wrong serial port. If GDB does not start, change the port to a second serial, which will cause GDB to start.
Back to top of page
debugging kernel issues with KGDB
Listing 8 lists the modified code in the source code of the jfs_mount.c file, and we create a null pointer exception in the tag, which causes the code to produce a segment error on line 109th.
Listing 8. Modified JFS_MOUNT.C Code
int Jfs_mount (struct super_block *sb) {... int ptr;/* Line 1 added */jfyi (1, ("\nmount jfs\n"));/* * Read/validate Superblo ck* (Initialize Mount Inode from the superblock) */if (rc = Chksuper (SB)) {goto errout20;} 108 ptr=0; /* Line 2 added */109 printk ("%d\n", *ptr); /* Line 3 Added */
Listing 9 shows a GDB exception after issuing the Mount command to the file system. KGDB provides several commands, such as displaying data structures and variable values, and showing what state all the tasks in the system are in, where they reside, where they use the CPU, and so on. Listing 9 shows a backtracking trace of the information provided for the problem; The where
command is used to perform the anti-Trace, which tells the executed call to stop somewhere in the code.
Listing 9. GDB exception and anti-trace
Mount-t Jfs/dev/sdb/jfsprogram received signal SIGSEGV, segmentation fault.jfs_mount (sb=0xf78a3800) at jfs_mount.c:109 109 PRINTK ("%d\n", *ptr);(gdb) where#0 Jfs_mount (sb=0xf78a3800) at jfs_mount.c:109#1 0XC01A0DBB in Jfs_read_super ... at s Uper.c:280#2 0xc0149ff5 in Get_sb_bdev ... at super.c:620#3 0xc014a89f in Do_kern_mount ... at Super.c:849#4 0xc0160e66 in Do_add_mount ... at namespace.c:569#5 0xc01610f4 in Do_mount ... at namespace.c:683#6 0xc01611ea in Sys_mount ... at name Space.c:716#7 0xc01074a7 in System_call () @ af_packet.c:1891#8 0x0 in--() (GDB)
The next section also discusses this same JFS segment error problem, but does not set the debugger, and if you execute the code in Listing 8 in a non-KGDB kernel environment, it uses the OOPS message that the kernel might generate.
Back to top of page
Oops Analysis
Oops (also known as panic, Panic) messages contain details of system errors, such as the contents of the CPU registers. In Linux, the traditional way to debug a system crash is to analyze Oops messages that are sent to the system console when a crash occurs. Once you have mastered the details, you can send the message to the Ksymoops utility, which will attempt to convert the code to instructions and map the stack value to the kernel symbol. In many cases, this information is sufficient for you to determine what the possible cause of the error is. Please note that the OOPS message does not include the core file.
Let's assume that the system has just created a Oops message. As the person writing the code, you want to solve the problem and determine what caused the Oops message to occur, or you want to provide most of the information about your problem to the developer of the code that displays the Oops message, so that you can solve the problem in a timely manner. The Oops message is part of the equation, but it doesn't help if you don't run it through the Ksymoops program. The following figure shows the process of formatting the Oops message.
Formatting Oops messages
Ksymoops requires several things: Oops message output, System.map files from the running kernel, and/proc/ksyms, Vmlinux, and/proc/modules. For information on how to use Ksymoops, the kernel source code/usr/src/linux/documentation/oops-tracing.txt or the Ksymoops manual page has a complete description to refer to. Ksymoops The Disassembly Code section, indicating the instruction where the error occurred and displaying a trace section indicating how the code was called.
First, save the Oops message in a file to run it through the Ksymoops utility. Listing 10 shows the Oops message created by the Mount command that installs the JFS file system, which is generated by the three lines of code added to the JFS installation code in Listing 8.
Listing 10. Ksymoops processed Oops messages
Ksymoops 2.4.0 on i686 2.4.17. Options used ... 15:59:37 SFB1 kernel:unable to handle kernel NULL pointer dereference atvirtual address 0000000 ... 15:59:37 sfb1 KERNEL:C01588FC ... 15:59:37 SFB1 kernel: *pde = 0000000 ... 15:59:37 sfb1 kernel:oops:0000 ... 15:59:37 sfb1 kernel:cpu:0 ... 15:59:37 SFB1 kernel:eip:0010:[jfs_mount+60/704] ... 15:59:37 sfb1 kernel:call Trace: [jfs_read_super+287/688] [get_sb_bdev+563/736] [do_kern_mount+189/336] [Do_add_mount +35/208][DO_PAGE_FAULT+0/1264] ... 15:59:37 sfb1 kernel:call Trace: [<c0155d4f>] ... 15:59:37 sfb1 kernel: [<c0106e04 ... 15:59:37 SFB1 kernel:code:8b 2d (xx) ...>>eip; C01588FC <jfs_mount+3c/2c0> <===== ... Trace; C0106CF3 <system_call+33/40>Code; C01588FC <jfs_mount+3c/2c0>00000000 <_EIP>:Code; C01588FC <jfs_mount+3c/2c0> <===== 0:8b 2d (XX)/xx/mov 0x0,%ebp <=====Code; c0158902 <jfs_mount+42/2c0> 6:55 Push%EBP
Next, you want to determine which line of code in Jfs_mount is causing the problem. The Oops message tells us that the problem is caused by an instruction at offset address 3c. One way to do this is to use the Objdump utility for the JFS_MOUNT.O file and then view the offset address 3c. Objdump is used to disassemble the module function to see what assembly instructions your C source code will produce. Listing 11 shows what you'll see after using Objdump, and then we'll look at the C code of Jfs_mount and see that the null value is caused by line 109th. Offset address 3c is important because the OOPS message identifies the location as causing the problem.
Listing 11. Jfs_mount List of assembler programs
109PRINTK ("%d\n", *ptr); objdump jfs_mount.ojfs_mount.o:file format elf32-i386disassembly of section. text:00000000 <JFS_MOUNT>: 0:55 push%ebp ... 2c:e8 CF, call <chkSuper> 31:89 c3 mov %eax,%ebx 33:58 pop %eax 34:85 db test%ebx,%ebx 36:0f-jne 291 <jfs_mount+0x291> 3c:8b 2d xx xx 0X0,%EBP << problem line above 42:55push%EBP
Back to top of page
Kdb
The Linux kernel debugger (Linux kernel debugger,kdb) is a patch for the Linux kernel that provides a way to check the kernel memory and data structure when the system is operational. Please note that KDB does not require two machines, but it does not allow you to debug at the source level as kgdb. You can add additional commands that give the identity or address of the data structure, which can format and display basic system data structures. The current set of commands allows you to control kernel operations including the following operations:
- Processor Single Step execution
- Stop when executing to a specific instruction
- Stop when accessing (or modifying) a specific virtual memory location
- Stop when accessing registers in the input/output address space
- Stack backtracking on the currently active task and all other tasks (via process ID)
- Disassembly of instructions
Chase Memory Overflow
You certainly don't want to get caught up in a situation like an allocation overflow that occurred after thousands of calls.
Our team spends a lot of time tracking weird memory errors. The application works on our development workstations, but on the new product workstation, the application malloc()
cannot run after 2 million calls. The real problem is that an overflow occurred after about 1 million calls. The problem with all of the new systems is that the layout of the reserved areas is different malloc()
, so that the scattered memory is placed in various places, and some different content is broken when an overflow occurs.
We solve this problem with a number of different techniques, one of which is to use the debugger, and the other is to add trace functionality to the source code. Probably this time in my career, I began to focus on memory debugging tools, hoping to solve these types of problems faster and more efficiently. One of the first things I did when I started a new project was to run Memwatch and YAMD to see if they would point to memory management issues.
Memory leaks are a common problem in your application, but you can use the tools described in this article to address these issues.
Back to top of page
4th Case: Backtracking tracking using magic keying sequence
If your keyboard is still available when Linux hangs, use the following methods to help resolve the cause of the hang problem. Following these steps, you can display the currently running process and all backtracking traces of the process using the Magic key sequence.
- The kernel you are running must be built with the enabled
CONFIG_MAGIC_SYS-REQ
condition. You must also be in text mode. CLTR+ALT+F1 will bring you into text mode, CLTR+ALT+F7 will bring you back to X Windows.
- When in text mode, press <ALT+SCROLLLOCK>, and then press <Ctrl+ScrollLock>. The magic keystroke above gives the current running process and the stack trace of all processes, respectively.
- Please find/var/log/messages. If everything is set up correctly, the system should have converted the kernel's symbolic address for you. Backtracking tracing will be written to the/var/log/messages file.
Back to top of page
Conclusion
There are many different tools available to help debug programs on Linux. The tools described in this article can help you solve many coding problems. Tools that can show the location of memory leaks, overflows, and so on can solve memory management problems, and I find Memwatch and yamd helpful.
Using the Linux kernel patch will enable GDB to work on the Linux kernel, which is helpful in solving the file system problems of Linux used in my work. In addition, the trace utility can help determine where the file system utility has failed during system calls. The next time you want to fix bugs in Linux, try one of these tools.
Mastering the Linux Debugging technology "turn"