Mastering Linux Debugging Techniques

Source: Internet
Author: User
Tags stack trace

The main way to identify and resolve program errors on Linux

You can monitor a running user-space program in a variety of ways: You can run the debugger for it, step through the program, add a print statement, or add tools to parse the program. This article describes several methods that you can use to debug programs that run on Linux. We'll review four debugging problems, including segment errors, memory overflows and leaks, and hangs.

This article discusses four ways to debug Linux programs. In the 1th case, we used two sample programs with memory allocation problems and debugged them using the Memwatch and yet another Malloc Debugger (YAMD) tools. In the 2nd case, we used the Strace utility in Linux, which tracks system calls and signals to find out where the program has gone wrong. In the 3rd case, we use the Oops feature of the Linux kernel to solve the program's segment errors and show you how to set the kernel source level debugger (kernel source levels debugger,kgdb) to use the GNU debugger (GNU Debugger,g db) to solve the same problem; the KGDB program is a remote GDB using a serial connected Linux kernel. In the 4th case, we use the magic keying sequence (magic key sequence) provided on Linux to display information about the component that raised the pending issue.

Common debugging methods

When you include an error in your program, it is likely that there is a condition somewhere in your code that you think is true (true), but is actually false (false). The process of locating the error is to overturn a condition process that was previously believed to be true after finding the error.

The following examples are some of the types of conditions that you may be confident of establishing:

    • Somewhere in the source code, a variable has a specific value.
    • At a given place, a structure has been set up correctly.
    • For a given if-then-else statement, the if part is the path to be executed.
    • When a subroutine is called, the routine receives its arguments correctly.

Finding an error is to determine if all of the above conditions exist. If you are sure that a variable should have a specific value when the subroutine is called, check to see if that is the case. If you believe that if the structure will be executed, then check if this is the case. Usually, your assumptions will be correct, but eventually you will find things that don't match your assumptions. As a result, you will find out where the error occurred.

Debugging is a task you can't escape. There are a number of ways to debug, such as printing messages to the screen, using the debugger, or just thinking about what the program is doing and carefully trying to figure out where the problem is.

Before you fix a problem, you must find out its source. For example, for a segment error, you need to know which line of the code the segment error occurred in. Once you find the line in the code that is faulted, determine the value of the variable in the method, how the method is called, and details about how the error occurred. Using the debugger will make it easy to find all this information. If no debugger is available, you can also use other tools. (Note that the debugger may not be available in the product environment, and the Linux kernel does not have a built-in debugger.) )

Useful memory and kernel tools

You can use debugging Tools on Linux to track user space and kernel issues in a variety of ways. Use the tools and techniques below to build and debug your source code:
User space Tools :

    • Memory tools: Memwatch and YAMD
    • Strace
    • GNU Debugger (GDB)
    • Magic keying sequence

Kernel Tools :

    • Kernel Source-level debugger (KGDB)
    • Built-in kernel debugger (KDB)
    • Oops

This article discusses a class of problems that are not easily found by manually checking the code, and such problems exist only in rare cases. Memory errors usually occur when there are multiple scenarios, and you can sometimes only discover memory errors after you deploy the program.

Back to top of page

1th Case: Memory Debugging Tool

The C language as a standard programming language on Linux systems gives us great control over dynamic memory allocation. However, this freedom can lead to serious memory management problems that can cause a program to crash or cause performance degradation over time.

A memory leak (that is malloc() , an internal call that is free() never freed after execution) and a buffer overflow (for example, writes to memory previously allocated to an array) are common problems that may be difficult to detect. This section discusses several debugging tools that greatly simplify the process of detecting and locating memory problems.

Back to top of page

Memwatch

Written by Johan Lindh, Memwatch is an open source C language memory error Detection tool that you can download yourself (see Resources later in this article). Once you have added a header file in your code and defined Memwatch in the GCC statement, you can track memory leaks and errors in your program. The Memwatch supports ANSI C, which provides results logging, detection of double release (Double-free), error release (erroneous free), memory not freed (Unfreed memories), overflow and underflow, and so on.

Listing 1. Memory Samples (TEST1.C)

#include <stdlib.h> #include <stdio.h> #include "memwatch.h" int main (void) {  char *ptr1;  char *ptr2;  PTR1 = malloc (+);  PTR2 = malloc (+);  PTR2 = PTR1;  Free (PTR2);  Free (PTR1);}

The code in Listing 1 allocates two 512-byte blocks of memory, and then a pointer to the first memory block is set to point to the second block of memory. As a result, the address of the second memory block is lost, resulting in a memory leak.

Now we compile the memwatch.c of Listing 1. The following is an example of a makefile:

Test1

Gcc-dmemwatch-dmw_stdio test1.c Memwatchc-o test1

When you run the Test1 program, it generates a report about the leaked memory. Listing 2 shows the sample Memwatch.log output file.

Listing 2. Test1 Memwatch.log File

  Memwatch 2.67 Copyright (C) 1992-1999 Johan lindh...double-free: <4> test1.c (), 0x80517b4 was freed from test1.c (1 4) ... unfreed: <2> test1.c (one), bytes at 0x80519e4{fe fe fe fe fe fe fe fe ......., ...} Memory usage statistics (GLOBAL):  N) Umber of Allocations Made:2  L) argest memory usage:1024  T) Otal of all AL LOC () calls:1024  U) nfreed bytes totals:512

Memwatch shows you the line that really caused the problem. If you release a pointer that has already been released, it will tell you. The same is true for memory that is not released. The end of the log displays statistics, including how much memory was leaked, how much memory was used, and how much memory was allocated in total.

Back to top of page

Yamd

The YAMD package was written by Nate Eldredge to look for dynamic memory allocation issues in C and C + +. At the time of this writing, the latest version of YAMD is 0.32. Please download yamd-0.32.tar.gz (see Resources). Execute the make command to build the program, then execute the make install command installer and set the tool.

Once you have downloaded YAMD, use it on the test1.c. Please delete #include memwatch.h and make the following minor changes to the makefile:

Using YAMD's Test1

Gcc-g Test1.c-o Test1

Listing 3 shows the output from the YAMD on the test1.

Listing 3. Test1 output using the YAMD

YAMD version 0.32Executable:/usr/src/test/yamd-0.32/test1 ... Info:normal allocation of this blockaddress 0x40025e00, size 512...info:normal allocation of this blockaddress 0x40028e0 0, size 512...info:normal deallocation of this blockaddress 0x40025e00, size 512...error:multiple freeing Atfree of Poin ter already freedaddress 0x40025e00, size 512...warning:memory leakaddress 0x40028e00, size 512warning:total Memory leak S:1 unfreed allocations totaling bytes*** finished at Tue ... 10:07:15 2002Allocated A grand total of bytes 2 Allocationsaverage of the bytes per Allocationmax bytes allocated at One time:102424 k alloced internally/12 k mapped now/8 K maxvirtual program size is 1416 kend.

YAMD shows that we have freed up memory and there is a memory leak. Let's try YAMD on another sample program in Listing 4.

Listing 4. Memory Code (TEST2.C)

#include <stdlib.h> #include <stdio.h>int main (void) {  char *ptr1;  char *ptr2;  char *chptr;  int i = 1;  PTR1 = malloc (+);  PTR2 = malloc (+);  Chptr = (char *) malloc (+);  for (i; I <=; i++) {    chptr[i] = ' S ';  }  PTR2 = PTR1;  Free (PTR2);  Free (PTR1);  Free (chptr);}

You can use the following command to start YAMD:

./run-yamd /usr/src/test/test2/test2

Listing 5 shows the output obtained using YAMD on the sample program test2. Yamd tells us for that there is a "cross-border (out-of-bounds)" Situation in the loop.

Listing 5. Test2 output using the YAMD

Running/usr/src/test/test2/test2temp Output To/tmp/yamd-out.1243*********./run-yamd:line 101:1248 Segmentation Fault (core dumped) YAMD version 0.32Starting run:/usr/src/test/test2/test2executable:/usr/src/test/test2/ Test2virtual program size is 1380 K ... Info:normal allocation of this blockaddress 0x40025e00, size 512...info:normal allocation of this blockaddress 0x40028e0 0, size 512...info:normal allocation of this blockaddress 0x4002be00, size 512error:crash ... Tried to write address 0x4002c000seems to being part of the block:address 0x4002be00, size 512...Address in question are at O Ffset (out of bounds) would dump core after checking heap. Done.

Memwatch and YAMD are useful debugging tools, and they are used in different ways. For Memwatch, you need to add the include file memwatch.h and open two compile time tokens. For link statements, YAMD only requires -g options.

Back to top of page

Electric Fence

Most Linux distributions include a Electric Fence package, but you can also choose to download it. Electric Fence is a debug library written by Bruce Perens malloc() . It allocates protected memory after you allocate memory. If there is a fencepost error (running beyond the end of the array), the program generates a protection error and ends immediately. By combining Electric Fence and GDB, you can precisely track which line is trying to access protected memory. Another feature of the Electric Fence is the ability to detect memory leaks.

Back to top of page

2nd case: Using Strace

straceA command is a powerful tool that can display all system calls made by a user-space program. Strace Displays the parameters of these calls and returns the values in the form of symbols. Strace receives information from the kernel and does not need to build the kernel in any particular way. It is useful to send trace information to both the application and the kernel developer. In Listing 6, there is an error in the format of the partition, and the list shows the beginning of the Strace, which is about calling up the Create File system operation ( mkfs ). Strace determines which call causes the problem to occur.

Listing 6. The beginning of strace on MKFS

 Execve ("/sbin/mkfs.jfs", ["MKFS.JFS", "-F", "/dev/test1"], & ... open ("/dev/test1", O_ rdwr|  O_largefile) = 4 Stat64 ("/dev/test1", {st_mode=&, St_rdev=makedev (n. 255), ...}) = 0 ioctl (4, 0x40041271, 0xbfffe128)  =-1 EINVAL (Invalid argument) write (2, "Mkfs.jfs:warning-cannot Setb" ..., 98mkfs.jfs:warning-cannot set blocksize On block device/dev/test1:invalid argument) = 98 Stat64 ("/dev/test1", {st_mode=&, St_rdev=makedev (63, 255), ...} ) = 0 Open ("/dev/test1", o_rdonly| O_largefile) = 5 ioctl (5, 0x80041272, 0xbfffe124) =-1 EINVAL (Invalid argument) write (2, "mkfs.jfs:can\ ' t determine Devi Ce "..... _exit (1) =? 

Listing 6 shows ioctl that the call caused the program used to format the partition to mkfs fail. ioctl BLKGETSIZE64failed. ( BLKGET-SIZE64 defined in the ioctl source code of the call.) ) BLKGETSIZE64 ioctl will be added to all of the devices in Linux, where the logical volume Manager does not yet support it. Therefore, if the BLKGETSIZE64 ioctl call fails, the MKFS code will call an earlier ioctl call, which makes it mkfs applicable to the logical Volume Manager.

Back to top of page

3rd scenario: Using GDB and Oops

You can use the GDB program (the Free Software Foundation debugger) from the command line to find the error, or you can use the GDB program from one of several graphical tools such as Data Display Debugger (DDD) to find the error. You can use GDB to debug a user-space program or a Linux kernel. This section only discusses the scenario of running GDB from the command line.

Use the gdb program name command to start GDB. GDB loads the executable symbol and displays an input prompt so you can start using the debugger. You can view the process with GDB in three ways:

    • Use the attach command to start viewing a process that is already running; attach will stop the process.
    • Use the Run command to execute the program and debug the program from the beginning.
    • Review the existing core files to determine the status at the time the process terminates. To view the core files, start gdb with the following command.gdb programname corefilename

      To debug with a core file, you need not only the program's executable and source files, but also the core file itself. To start gdb with the core file, use the-C option:gdb -c core programname

      GDB shows which line of code causes the program to have a core dump.

Before you run a program or connect to a program that is already running, list the source code that you feel is wrong, set a breakpoint, and then start debugging the program. You can use the help commands to view comprehensive GDB online Help and detailed tutorials.

Back to top of page

Kgdb

The KGDB program (using GDB's remote host Linux kernel debugger) provides a mechanism to debug the Linux kernel using GDB. The KGDB program is an extension of the kernel that allows you to connect to a kernel machine running with KGDB extensions when running GDB on a remote host. You can then go deep into the kernel, set breakpoints, examine the data, and do something else (similar to how you use GDB on your application). One of the main features of this patch is that the host running GDB connects to the target machine during the boot process (running the kernel to be debugged). This allows you to start debugging as early as possible. Note that the patch adds functionality to the Linux kernel, so gdb can be used to debug the Linux kernel.

Using KGDB requires two machines: one is the development machine and the other is the test machine. A serial line (null modem cable) will connect them through the serial port of the machine. The kernel you want to debug runs on the test machine; gdb runs on the development machine. GDB uses a serial line to communicate with the kernel you are debugging.

Follow these steps to set up the KGDB debugging environment:

    1. Download the applicable patch for your Linux kernel version.
    2. Build the components into the kernel, as this is the simplest way to use KGDB. (Note that there are two ways to build most kernel components, such as modules or directly into the kernel.) For example, the logging file system (journaled files SYSTEM,JFS) can be built as a module or built directly into the kernel. By using the GDB patch, we can build the JFS directly into the kernel. )
    3. Apply kernel patches and re-build the kernel.
    4. Create a file named. Gdbinit and save it in the kernel source file subdirectory (in other words,/usr/src/linux). The following four lines of code are in the file. Gdbinit:
      • set remotebaud 115200
      • symbol-file vmlinux
      • target remote /dev/ttyS0
      • set output-radix 16
    5. Adding the Append=gdb line to Lilo,lilo is the boot loader that is used to choose which kernel to use when booting the kernel.
      • image=/boot/bzImage-2.4.17
      • label=gdb2417
      • read-only
      • root=/dev/sda8
      • append="gdb gdbttyS=1 gdb-baud=115200 nmi_watchdog=0"

Listing 7 is a sample script that introduces the cores and modules you build on your development machine into a test machine. You need to modify the following:

    • [email protected]: User ID and machine name.
    • /usr/src/linux-2.4.17: The directory of the kernel source tree.
    • bzImage-2.4.17: The kernel name to be booted on the test machine.
    • rcpand rsync : it must be allowed to run on the machine that builds the kernel.

Listing 7. Script that introduces the kernel and module of the test machine

SET-XRCP [email protected]:/USR/SRC/LINUX-2.4.17/ARCH/I386/BOOT/BZIMAGE/BOOT/BZIMAGE-2.4.17RCP [email protected]:/ usr/src/linux-2.4.17/system.map/boot/system.map-2.4.17rm-rf/lib/modules/2.4.17rsync-a [Email protected]:/lib/ Modules/2.4.17/lib/moduleschown-r Root/lib/modules/2.4.17lilo

Now we can start the GDB program on the development machine by changing to a directory starting with the kernel source tree. In this example, the kernel source tree is located in/usr/src/linux-2.4.17. Enter the gdb startup program.

If everything is OK, the test machine will stop during the startup process. Enter gdb cont the command to continue the startup process. A common problem is that the null modem cable may be connected to the wrong serial port. If GDB does not start, change the port to a second serial, which will cause GDB to start.

Back to top of page

debugging kernel issues with KGDB

Listing 8 lists the modified code in the source code of the jfs_mount.c file, and we create a null pointer exception in the tag, which causes the code to produce a segment error on line 109th.

Listing 8. Modified JFS_MOUNT.C Code

int Jfs_mount (struct super_block *sb) {... int ptr;/* Line 1 added */jfyi (1, ("\nmount jfs\n"));/* * Read/validate Superblo ck* (Initialize Mount Inode from the superblock) */if (rc = Chksuper (SB)) {goto errout20;} 108 ptr=0; /* Line 2 added */109 printk ("%d\n", *ptr); /* Line 3 Added */

Listing 9 shows a GDB exception after issuing the Mount command to the file system. KGDB provides several commands, such as displaying data structures and variable values, and showing what state all the tasks in the system are in, where they reside, where they use the CPU, and so on. Listing 9 shows a backtracking trace of the information provided for the problem; The where command is used to perform the anti-Trace, which tells the executed call to stop somewhere in the code.

Listing 9. GDB exception and anti-trace

Mount-t Jfs/dev/sdb/jfsprogram received signal SIGSEGV, segmentation fault.jfs_mount (sb=0xf78a3800) at jfs_mount.c:109 109 PRINTK ("%d\n", *ptr);(gdb) where#0 Jfs_mount (sb=0xf78a3800) at jfs_mount.c:109#1 0XC01A0DBB in Jfs_read_super ... at s  Uper.c:280#2 0xc0149ff5 in Get_sb_bdev ... at super.c:620#3 0xc014a89f in Do_kern_mount ... at Super.c:849#4 0xc0160e66 in Do_add_mount ... at namespace.c:569#5 0xc01610f4 in Do_mount ... at namespace.c:683#6 0xc01611ea in Sys_mount ... at name Space.c:716#7 0xc01074a7 in System_call () @ af_packet.c:1891#8 0x0 in--() (GDB)

The next section also discusses this same JFS segment error problem, but does not set the debugger, and if you execute the code in Listing 8 in a non-KGDB kernel environment, it uses the OOPS message that the kernel might generate.

Back to top of page

Oops Analysis

Oops (also known as panic, Panic) messages contain details of system errors, such as the contents of the CPU registers. In Linux, the traditional way to debug a system crash is to analyze Oops messages that are sent to the system console when a crash occurs. Once you have mastered the details, you can send the message to the Ksymoops utility, which will attempt to convert the code to instructions and map the stack value to the kernel symbol. In many cases, this information is sufficient for you to determine what the possible cause of the error is. Please note that the OOPS message does not include the core file.

Let's assume that the system has just created a Oops message. As the person writing the code, you want to solve the problem and determine what caused the Oops message to occur, or you want to provide most of the information about your problem to the developer of the code that displays the Oops message, so that you can solve the problem in a timely manner. The Oops message is part of the equation, but it doesn't help if you don't run it through the Ksymoops program. The following figure shows the process of formatting the Oops message.

Formatting Oops messages

Ksymoops requires several things: Oops message output, System.map files from the running kernel, and/proc/ksyms, Vmlinux, and/proc/modules. For information on how to use Ksymoops, the kernel source code/usr/src/linux/documentation/oops-tracing.txt or the Ksymoops manual page has a complete description to refer to. Ksymoops The Disassembly Code section, indicating the instruction where the error occurred and displaying a trace section indicating how the code was called.

First, save the Oops message in a file to run it through the Ksymoops utility. Listing 10 shows the Oops message created by the Mount command that installs the JFS file system, which is generated by the three lines of code added to the JFS installation code in Listing 8.

Listing 10. Ksymoops processed Oops messages

   Ksymoops 2.4.0 on i686 2.4.17. Options used ... 15:59:37 SFB1 kernel:unable to handle kernel NULL pointer dereference atvirtual address 0000000 ... 15:59:37 sfb1 KERNEL:C01588FC ... 15:59:37 SFB1 kernel: *pde = 0000000 ... 15:59:37 sfb1 kernel:oops:0000 ... 15:59:37 sfb1 kernel:cpu:0 ... 15:59:37 SFB1 kernel:eip:0010:[jfs_mount+60/704] ... 15:59:37 sfb1 kernel:call Trace: [jfs_read_super+287/688] [get_sb_bdev+563/736] [do_kern_mount+189/336] [Do_add_mount +35/208][DO_PAGE_FAULT+0/1264] ... 15:59:37 sfb1 kernel:call Trace: [<c0155d4f>] ... 15:59:37 sfb1 kernel: [<c0106e04 ... 15:59:37 SFB1 kernel:code:8b 2d (xx) ...>>eip; C01588FC <jfs_mount+3c/2c0> <===== ... Trace; C0106CF3 <system_call+33/40>Code; C01588FC <jfs_mount+3c/2c0>00000000 <_EIP>:Code; C01588FC <jfs_mount+3c/2c0> <===== 0:8b 2d (XX)/xx/mov 0x0,%ebp <=====Code; c0158902 <jfs_mount+42/2c0> 6:55 Push%EBP

Next, you want to determine which line of code in Jfs_mount is causing the problem. The Oops message tells us that the problem is caused by an instruction at offset address 3c. One way to do this is to use the Objdump utility for the JFS_MOUNT.O file and then view the offset address 3c. Objdump is used to disassemble the module function to see what assembly instructions your C source code will produce. Listing 11 shows what you'll see after using Objdump, and then we'll look at the C code of Jfs_mount and see that the null value is caused by line 109th. Offset address 3c is important because the OOPS message identifies the location as causing the problem.

Listing 11. Jfs_mount List of assembler programs

  109PRINTK ("%d\n", *ptr); objdump jfs_mount.ojfs_mount.o:file format elf32-i386disassembly of section. text:00000000 <JFS_MOUNT>:   0:55 push%ebp ...  2c:e8 CF, call   <chkSuper>  31:89 c3       mov     %eax,%ebx  33:58    pop     %eax  34:85 db       test%ebx,%ebx  36:0f-jne 291 <jfs_mount+0x291>  3c:8b 2d xx xx 0X0,%EBP << problem line above  42:55push%EBP

Back to top of page

Kdb

The Linux kernel debugger (Linux kernel debugger,kdb) is a patch for the Linux kernel that provides a way to check the kernel memory and data structure when the system is operational. Please note that KDB does not require two machines, but it does not allow you to debug at the source level as kgdb. You can add additional commands that give the identity or address of the data structure, which can format and display basic system data structures. The current set of commands allows you to control kernel operations including the following operations:

    • Processor Single Step execution
    • Stop when executing to a specific instruction
    • Stop when accessing (or modifying) a specific virtual memory location
    • Stop when accessing registers in the input/output address space
    • Stack backtracking on the currently active task and all other tasks (via process ID)
    • Disassembly of instructions
Chase Memory Overflow

You certainly don't want to get caught up in a situation like an allocation overflow that occurred after thousands of calls.

Our team spends a lot of time tracking weird memory errors. The application works on our development workstations, but on the new product workstation, the application malloc() cannot run after 2 million calls. The real problem is that an overflow occurred after about 1 million calls. The problem with all of the new systems is that the layout of the reserved areas is different malloc() , so that the scattered memory is placed in various places, and some different content is broken when an overflow occurs.

We solve this problem with a number of different techniques, one of which is to use the debugger, and the other is to add trace functionality to the source code. Probably this time in my career, I began to focus on memory debugging tools, hoping to solve these types of problems faster and more efficiently. One of the first things I did when I started a new project was to run Memwatch and YAMD to see if they would point to memory management issues.

Memory leaks are a common problem in your application, but you can use the tools described in this article to address these issues.

Back to top of page

4th Case: Backtracking tracking using magic keying sequence

If your keyboard is still available when Linux hangs, use the following methods to help resolve the cause of the hang problem. Following these steps, you can display the currently running process and all backtracking traces of the process using the Magic key sequence.

    1. The kernel you are running must be built with the enabled CONFIG_MAGIC_SYS-REQ condition. You must also be in text mode. CLTR+ALT+F1 will bring you into text mode, CLTR+ALT+F7 will bring you back to X Windows.
    2. When in text mode, press <ALT+SCROLLLOCK>, and then press <Ctrl+ScrollLock>. The magic keystroke above gives the current running process and the stack trace of all processes, respectively.
    3. Please find/var/log/messages. If everything is set up correctly, the system should have converted the kernel's symbolic address for you. Backtracking tracing will be written to the/var/log/messages file.

Back to top of page

Conclusion

There are many different tools available to help debug programs on Linux. The tools described in this article can help you solve many coding problems. Tools that can show the location of memory leaks, overflows, and so on can solve memory management problems, and I find Memwatch and yamd helpful.

Using the Linux kernel patch will enable GDB to work on the Linux kernel, which is helpful in solving the file system problems of Linux used in my work. In addition, the trace utility can help determine where the file system utility has failed during system calls. The next time you want to fix bugs in Linux, try one of these tools.

Mastering Linux Debugging Techniques

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.