Introduction to Linux debugging technology (programming and development)

Last Update:2018-12-03 Source: Internet

Author: User

Tags decode all ftp site stack trace

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

For anyone who writes the kernel code, one of the most attractive issues is how to complete the debugging. Because the kernel is a function set not related to a process, its Code cannot be easily put in the debugger
And cannot be tracked. This section describes the techniques you can use to monitor kernel code and track errors. The most common debugging technology for debugging with printed information is monitoring, that is, adding appropriate points in the Application
Printf call. When you debug the kernel code, you can use printk to complete this task. In the previous chapters, we simply assume that printk is working together
Printf is similar. It is time to introduce the differences between them. One of the differences is that printk allows you to cancel different "record levels" based on their severity.
Or give priority to messages. You can use macros to indicate the record level. For example, kern_info, we can see that it is added before the print Statement, which is a possible message record level.
The record-level macro is expanded as a string that is spliced with the message text during compilation. That is why there is no comma between the priority and format strings in the following example. There are two printk examples.
Key information: (CODE) Eight record-level strings are defined in Linux/kernel. h. Printk statements with no specified priority are used by default.
Ult_message_loglevel priority, which is an integer defined in kernel/printk. C. The default record-level value is
It has changed several times during development, so I suggest you always specify an appropriate record level. At the record level, the kernel prints the message to the current text Console: If the priority is lower
The console_loglevel value indicates that the message is displayed on the console. If the system runs both klogd and syslogd
The kernel appends messages to/var/log/messages for the value of lele_loglevel. The console_loglevel variable is initially initialized
Default_lele_loglevel, but it can be modified by calling sys_syslog. As shown in the klogd manual, you can specify
-C switch to modify this variable. In addition, you can write a program to change the console record level. You can find a program with this function in the source file on the o'reilly site,
Miscprogs/setlevel. C. The new priority is specified by an integer between 1 and 8. You may need to downgrade the record level after the kernel fails (see "Debug system fault ").
It is because the invalid processing code will upgrade console_loglevel to 15, and then all messages will appear on the console. To view your debugging information, if you are running the kernel
In 2.0.x, you need to increase the record level. The kernel 2.0 release reduces the minimum_console_loglevel, while the old version of klogd requires a lot of control printing by default.
Message. If you happen to use this old version of the daemon, unless you upgrade the record level, kernel 2.0 will print fewer messages than you expected. This is why 1 mark is used in hello. C.
To ensure that the message is displayed on the console. The kernel version 1.3.43 allows you to send messages to the specified virtual console to provide a flexible record policy. By default
Is the current virtual terminal. You can also select different virtual terminals to receive messages. You only need to call IOCTL (tioclinux) to the selected virtual terminal ). The following program, setconsole, can
Used to select the virtual terminal to receive kernel messages. It must run as a Super User. If you are not sure about ioctl, you can skip this section to the next section and wait until you finish reading chapter 5th
After the "IOCTL" section of the extended operation, read the code here. (CODE) setconsole uses special IOCTL commands for Linux-specific functions
Tioclinux. To use tioclinux, you must pass a pointer to the byte array. The first byte of the array is the encoding of the requested Sub-command, and the subsequent bytes depend on the command
. The sub-command 11 is used in the setconsole, And the last byte (stored in bytes [1]) is marked as the virtual console. After tioclinux is completed, you can go to the kernel source code
Drivers/Char/tty_io.c. How does the printk function record a message write it to a loop buffer with a length of log_buf_len bytes?
. Then wake up any processes waiting for messages, that is, those that sleep during the process of calling the syslog system or reading/proc/kmesg. The two access record engine interfaces are equivalent.
However, the/proc/kmesg file is more like a FIFO file, making it easier to read data from it. A simple cat command can read messages. If the cyclic buffer is full, printk
Enter new data at the beginning of the buffer to overwrite old data. The record process loses the oldest data. This problem is negligible compared to the benefits of using the circular buffer. For example, loop buffer
Zone allows the system to run without recording the process, without wasting memory. Another feature of the Message Processing Method in Linux is that printk can be called anywhere, or even interrupted.
The processing function can also be called, and there is no limit on the size of the data volume. The only drawback of this method is that some data may be lost. If klogd is running, it reads kernel messages and distributes them
Syslogd, which then checks/etc/syslog. conf to find a way to process the data. Syslogd splits messages based on a "facility" and "priority ".
The value is defined in sys/syslog. h. The kernel message is recorded in the log_kern facility according to the priority specified in the corresponding printk. If klogd is not running, the data is saved in
Loop buffer until there are processes to read data or data overflow. If you do not want to mess up your system records because of monitoring your driver messages, you can specify the-F (File) option or modify it for klogd.
/Etc/syslog. conf writes records to another file. The other method is a tough one: klogd is killed, messages are printed to unused virtual terminals *, or
Execute cat on xterm
/Proc/kmesg displays messages. Preprocessing is used to facilitate monitoring and processing in the early stages of driver development. printk can be very helpful for debugging and testing new code. However, when you officially release the driver
In sequence, you should remove or at least close these print statements. Unfortunately, you may soon find that, as you want to remove those messages, you may need to add new functions.
. There are several ways to solve these problems: how to enable and disable messages globally and how to enable and disable individual messages. The following shows most of the code I use to process a message. It has the following functions:
You can open or close each statement by adding a letter to the macro name or removing a letter. You can disable all messages at one time by modifying the cflags variable before compilation. The same print statement can be used
Core State (driver) can also be used in user State (demo or test program ). The following code snippets from scull. h directly implement these functions. (CODE) conforms to pdebug and
Pdebugg depends on whether scull_debug is defined. They are similar to printf calls. To further facilitate this process, add the following to your makefile:
Line. (CODE) the Code provided in this section depends on GCC For ANSI
C pre-compiler extension. GCC supports macros with variable number parameters. This dependency on GCC is not a problem because the kernel is more dependent on GCC features. Makefile dependency
In GNU gmake; based on the same principle, this is not a problem. If you are familiar with the c pre-compiler, you can extend the above definition to support the "debug level" concept and assign one to each level.
Integers (or Bitmap) indicate how trivial messages are printed at this level. However, each driver has its own functions and monitoring requirements. Good programming skills will find a balance between flexibility and efficiency, which
I can't say which is the best for you. Remember, the pre-compiler conditions (and constant expressions in code) run only at compile time and you must re-compile the program to open or close the message. Another method is to use C
Condition Statement, which runs at run time, so that you can enable or disable messages during program execution. This function is good, but every time the Code Execution System performs additional processing, even after the message is closed
Response performance. Sometimes this performance loss is unacceptable. In my opinion, although the macro above forces you to re-compile and reload the module every time you add or remove messages, I think it is good to use these macros.
. The previous section on query debugging talked about how printk works and how to use it. But I did not talk about its shortcomings. Because syslogd keeps refreshing its output file, each row is printed
This will cause a disk operation. Therefore, excessive use of printk will seriously reduce system performance. At least from the perspective of syslogd. It will write all the data to the disk
The system crashes after messages are printed. However, you do not want to reduce system performance because of debugging information. To solve this problem, you can record the file name in/etc/syslogd. conf and add
But sometimes you don't want to modify your configuration file. Otherwise, you can run a non-klogd program (such as the cat
/Proc/kmesg), but this does not provide a suitable environment for normal operations. In comparison, the best way is to obtain relevant information through the query system when you need information, instead
Continuously generate data. In fact, every UNIX system provides many tools to obtain system information, such as PS, netstat, and vmstat. There are many techniques suitable for driver
The developer query system is simply to create a file under/proc and use the ioctl driver method. Use the/proc file system in Linux and any
Devices are irrelevant. The files in D, D, and proc are created at the core when they are read. These files are common text files, which can be understood by common people or tool programs. Example
For example, for most Linux ps implementations, it obtains the process information by reading the/proc file system. The idea of/proc virtual files has been used by several modern operating systems and is very
Successful. The current implementation of/proc allows the dynamic creation of I nodes, allowing the user module to create an entry point for convenient information retrieval. To create a sound file node in/proc (read,
Write, seek, etc.). You need to define the file_operations structure and inode_operations structure. The latter has a similar effect and size as the former. Create
Creating such an I node is no different than creating an entire character device. We will not discuss this issue here. If you are interested, you can get further details in the source code tree fs/proc. And most
Like proc files, if file nodes are only used for reading, it is easier to create them. I will introduce this technology here. Unfortunately, this technology can only be used in Linux
2.0 and later versions. Create a scull code called the/proc/scullmem file to obtain the memory information used by scull. (CODE)
It is very easy to enter the/proc file. Your function retrieves data from an idle page. It writes the data into the buffer and returns the length of the written data. All other tasks are handled by the/proc file system. Unique
The limit is that the data written cannot exceed page_size (the macro page_size is defined in the header file ASM/page. h; it is related to the architecture, but you can at least
Size of 4 kb ). If you need to write data on more than one page, you must implement a file with sound functionality. Note: If a process reading your/proc file sends several read calls
One of them obtains new data. Although only a small amount of data is read, your driver will rewrite the entire buffer every time. This extra work will degrade the system performance, and if the data generated by the file and the next
Similarly, unrelated parts need to be re-assembled for future read calls, which will cause data dislocation. In fact, the performance is not a problem because every application using the C library reads data in large chunks. However,
Due to frequent misplacement, it is a problem worth consideration. After , the database call must call the read Statement at least once. Only when the read Statement returns 0, the end of the report file is reported. If the driver encounters
The system returns additional bytes to the user space and is out of place with the previous data blocks. We will refer to the section "task queue" in Chapter 6th "time stream"
/Proc/jiq *. At that time, we will encounter dislocation problems. In cleanup_module, use the following statement to log out of the/proc node: (CODE) the parameter passed to the function is a package
The directory name and the node I of the file to be removed. Because the I node number is automatically allocated, it cannot be known during compilation and must be read from the data structure. IOCTL method ioctl, which will be detailed in the next chapter
In detail, it is a system call, which can be implemented on the file descriptor; it receives a "command" number and (optional) a parameter, usually a pointer. As an alternative to the/proc file system,
You can implement several IOCTL commands for debugging. These commands copy the relevant data from the driver space to the process space and check the data in the process space. Only when IOCTL is used to
/Proc is difficult, because a program calls IOCTL and displays the result. You must compile such a program and maintain consistency with the modules you test. But sometimes this is the best
Because it is much faster than read/proc. If data must be processed before being written to the screen, in binary format is much more effective than reading text files. In addition,
IOCTL does not limit the size of returned data. One advantage of the ioctl method is that the DEBUG command can still be stored in the driver after debugging is disabled. /Proc file for anyone who views this directory
Is visible, but unlike the/proc file, undisclosed IOCTL commands are usually not noticed. In addition, if the driver has any exceptions, they can still be used for debugging. The only drawback is that
The module is slightly larger. Sometimes you encounter problems through monitoring and debugging. By running applications in the user space to view the interaction process between the driver and the system, you can capture some small questions.
And verify that the driver is working properly. For example, I have more confidence in scull after seeing how scull implements read requests with different data volumes. There are many ways to monitor
How a user-State program works. You can use the debugger to track its functions step by step, insert print statements, or run programs with strace. The last technology is not
It is often used. The strace command is a powerful tool that can be called by all systems called by real programs. It not only displays the call, but also displays the call parameters in symbolic form.
Return Value. When the system call fails, the wrong symbol value (such as enomem) and the corresponding string (out
Memory. Strace also has many command line options. The most common option is-T, which is used to display the call time,-t, the time consumed by the call, and-o, which will output multiple
To a file. By default, strace prints all trace information to stderr. Strace receives information from the kernel. This means that a program is compiled by debugging or not.
(With the-G option of GCC) or removed symbol information can be tracked. Similar to a debugger that can connect to a running process and control it, you can also track a running process. Common tracing information
It is used to generate error reports to application developers, but it is also very useful for Kernel programmers. We can see how the system calls execute the driver code; strace allows us to check
The input/output consistency of each call. For example, the following screen output provides the command ls/dev
The last few lines of/dev/scull0: (CODE) It is obvious that it tries to write 4 kb in the first write call after ls completes the retrieval of the target directory. It's strange that I only wrote 4000 words.
And then try again. However, we know that the write Implementation of scull only writes one quantum at a time. Here I see some write operations. After several steps, everything is cleared, and the program
Exit normally. In another example, let's read the scull device: (CODE) as expected, read can only read 4000 bytes at a time, but the total amount of data remains unchanged. Note that the retrying work in this example is
How to organize it, pay attention to its comparison with the above write tracking. WC is specially optimized for fast data reading. It bypasses the standard library so that more data can be read with one system call each time. You can
In the read row, WC reads 16 KB each time. UNIX experts can find a lot of useful information in strace output. If you are confused by these symbols, I can only read the file method.
(Open, read, etc. I personally think that the tracing tool is most useful in identifying system call runtime errors. Generally, the perror call in an application or demo is insufficient.
Debugging, and it is helpful to find out what kind of parameter triggers the system call error. Debugging system faults even if you use all the monitoring and debugging technologies, sometimes there are still errors in the driver.
Such execution of the driver may cause system faults. When this happens, it is vital to obtain enough information to solve the problem. Note that "fault" does not mean "panic ". Linux code non-
It is often robust and can respond well to most errors: faults usually lead to termination of the current process, but the system continues to run. If a fault occurs outside the context of the process or an important part of the system fails,
The system may be panic. But when the problem occurs in the driver, it usually only causes the faulty process to terminate. D. That is, the process using the driver. The only irreparable loss occurs when the process is terminated,
Memory allocated by the process context is lost. For example, the dynamic linked list allocated by the driver through kmalloc may be lost. However, because the kernel will call close to a device that is still on, your driver
You can release any resources allocated by open methods in sequence. As we have already said, some useful information will be displayed on the console when kernel behavior is abnormal. The next section describes how to decode and use these messages. Although it
For beginners, it is quite obscure. The data provided by the processor is very interesting information. Generally, program errors can be identified without additional tests. Most errors of OOPS messages are null pointer references or
Use other incorrect pointer values. These errors usually lead to an oops message. The addresses used by the processor are all "virtual" addresses, and are called a page table through a complex table (see Chapter 13th "MMAP"
And the "page table" section in DMA) is mapped to a physical address. When an Invalid Pointer is referenced, the page ing mechanism cannot map addresses to physical addresses, and the processor sends a "page
Surface Failure ". If the address is indeed invalid, the kernel cannot "change pages" from the Invalid Address. if the process is too high for the super user, the system generates a "Oops ". It is worth noting that in the version
In 2.1, the kernel processing failure mode has changed. It can handle the illegal address reference in the Super User State. The new implementation will be introduced in chapter 17th "processing kernel space failure" in "Recent developments. Oops
Displays the processor status, module CPU register content, page Descriptor Table location, and other information that does not seem to be understood during the fault. These are caused by invalid processing functions.
The printk statement in (ARCH/*/kernel/traps. c) is generated and distributed as described in the previous section "printk. Let's take a look at this elimination
. This section describes how to run Linux on a traditional PC (x86 Platform ).
The output of OOPS 2.0 or later version 1.2 is slightly different. (CODE) The above message is to run cat to a failure module that intentionally adds an error. Fault. c crash
The following code: (CODE) Because read copies data from its small buffer zone (faulty_buf) to the user space, we hope to read a small file to work. However, read more than 1 kb each time
Data will span the page boundary. If an illegal page read is accessed, it will fail. In fact, the preceding oops occurs when a 4 kb read request is sent.
The oops message in/var/log/messages (syslogd stores kernel messages by default) is provided before the oops message: (CODE) the same cat command cannot be produced on Alpha.
Oops is generated because the size of 4 kb read from faulty_buf does not exceed the page boundary (the size of the page on Alpha is 8 KB, and the buffer zone is near the starting position of the page ). If
Read faulty on your system without generating oops. Try WC or explicitly specify the block size for DD. The biggest problem with ksymoopsoops messages is that hexadecimal values
It makes no sense for programmers; they need to be resolved as symbols. The kernel source code uses the ksymoops tool included in the tool to help developers. However, note that this process is not found in the source code of version 1.2.
. This tool parses the numeric address in the oops message into the kernel symbol, but only the oops message generated by the PC. Because the message itself is related to the processor, each system structure has its own message
Format. Ksymoops obtains the oops message from the standard input and the name of the command line kernel symbol table. The symbol table is usually/usr/src/Linux/system. Map.
The program prints the call track and program code in a more readable way, rather than the original oops message. The following snippet is the result obtained by using the oops message in the previous section: (CODE) from ksymoops
The decompiled Code provides invalid commands and Subsequent commands. Obviously, D. D. For those who know a little bit about assembly, D. drepz movsl command (Repeat till
CX is zero, move a string
Longs) uses the source index (ESI, 0x202e000) to access an unmapped page. Ksymoops used to obtain module information
-Given by the M command, the module maps to a page on 0x0202dxxx, which also confirms that leesi is indeed out of the range. The memory occupied by the faulty module is decoded because it is not in the system table.
The call track also provides two numerical addresses. These values can be manually supplemented, output through the ksyms command, or query the module name in/proc/ksyms. However
The two addresses do not match the Code address. If you read ARCH/i386/kernel/traps. C, you will find that the call track is from the entire stack and uses some heuristic methods.
Distinguish between data values (local variables and function parameters) and return addresses. Only the address of the referenced kernel code and the address of the referenced module are provided in the call track. As the module occupies both the Code and data on the page
Hybrid stacks may miss heuristic information, which is the case of the above two 0x202xxxx addresses. If you do not want to manually view the module address, the following pipelines can be used to create a module with both the kernel and modules.
Symbol table. You must re-create the symbol table whenever you load the module. (CODE) This pipeline combines the complete system table with the public kernel symbols in/proc/ksyms,
In addition to kernel symbols, the latter also includes the module symbols in the current kernel. These addresses appear in/proc/ksyms after the insmod code is relocated. Because the two files have different formats,
Sed and awk are used to convert all text lines into a suitable format. Sort the table and remove the duplicate parts so that ksymoops can be used. If we re-run
Ksymoops, which captures the following information from the new symbol table: (CODE) As you can see, when tracking oops messages related to modules, creating a revised system table is helpful:
Ksymoops can decode the instruction pointer and complete the entire call track. Note that the format of the explicit disassembly code is the same as that used by the objdump. Objdump is also a task
Powerful tools; if you need to view the commands before failure, you can call the command objdump #0; d
Faulty. O. In the Assembly List of the file, the string faulty_read + 45/60 is marked as invalid rows. For more information about objdump and its command line options, see
The command manual. Even if you have built your own revised symbol table, the above-mentioned call track problems still exist: although the 0x202xxxx pointer is decoded, it is still false. Learn to decode
Oops messages require some experience, but it is indeed worth doing. The time used for learning will soon return. However, since the Unix Syntax of machine commands is different from the Intel syntax, the only problem is that
Where do you get the relevant Assembly Language Documentation? Although you know PC assembly language, your experience is achieved by programming with Intel syntax. In the reference book example, I have provided some useful books. Use
Oops using ksymoops is cumbersome. You need a c ++ compiler to compile it. You also need to build your own symbol table to make full use of the program's capabilities. You also need to combine the original message and ksymoops
The output is combined to form available information. If you don't want to bother yourself, you can use oops programs. Oops in this book's o'reilly
The source code provided by the FTP site. It originated from the original ksymoops tool and is no longer maintained by its authors. Oops is written in C language and can be viewed directly
/Proc/ksyms instead of creating a new symbol table after each module is loaded. The program tries to decode all the processor registers and parse the stack trace into a symbolic value. Its disadvantage is that it is better
Ksymoops is a bit wordy, but usually the more information you have, the faster you will find errors. Another advantage of oops is that it can parse oops consumption of X86, Alpha, and
. Same as the kernel source code, this program is also released by GPL. The output produced by oops is similar to that produced by ksymoops, but it is more complete. The starting part of the previous oops output #0;
In this oops message, the stack does not save anything useful. I don't think it should show the entire stack track: (CODE) When you debug the "real" module (faulty is too short, it makes no sense)
It is very helpful to decode registers and stacks, and it is more helpful if all the module symbols to be debugged are open. In case of failure, the Processor register generally does not point to the module symbol, only when the symbol table
Only when it is open to/proc/ksyms can you win the bidding for them. We can use the following steps to create a more complete symbol table. First, we should not declare static variables in the module, otherwise we will not
They are opened with insmod. Second, the code for intercepting the init_module function from scull is shown below. We can use # ifdef
Scull_debug or a similar macro shields register_symtab calls. (CODE) We have seen the class in the "register symbol table" section of "Writing and running modules" in Chapter 2nd.
It seems that if the module does not register a symbol table, all global symbols will be open. Although this function is only valid when scull_debug is activated, to avoid namespace contamination in the kernel
All global symbols have the appropriate prefix (see "modules and Applications" in Chapter 2nd ). The latest version of the klogdklogd daemon can be stored in the log file before the oops
Decode the oops message. The decoding process is only completed by the daemon of version 1.3 or later.
/Usr/src/Linux/system. MAP is decoded only when it is passed to the daemon as a parameter. (You can use another symbol table file to replace system. map)
The oops of faulty given by klogd is as follows, which is written to the system record: (CODE) The klogd that I want to decode is good for debugging the general Linux installation core.
Tool. Messages decoded by klogd include a majority of ksymoops functions, and require the user to compile additional tools or merge to provide complete error reports when the system fails.
Two outputs. When oops occurs in the kernel, the daemon correctly decodes the command pointer. It does not disassemble the code, but this is not a problem. When the error report gives a message, the binary data still exists.
Offline disassembly code. Another feature of the daemon is that if the symbol table version does not match the current kernel, it rejects the parsing of the symbol. If a symbol is parsed in the system record, you can be sure that it is the correct solution.
. However, although it is very helpful for Linux users, this tool does not help in debugging modules. I personally did not use the decoding option on an open software computer. The problem with klogd is that it does not parse
Symbol in the module, because the daemon has run before the programmer loads the module, even if/proc/ksyms is read, it will not be helpful. The parsed symbols in the record file will make oops
Obfuscation with ksymoops makes further parsing difficult. If you need to use klogd to debug your module, the daemon in the latest version needs some new special support. I hope it will be completed,
You only need to patch the kernel. System suspension although most errors in kernel code only cause one oops message, sometimes they are difficult to completely suspend the system. If the system is suspended and there is no message
Can be printed. For example, if the code encounters an endless loop and the kernel stops the scheduling process, the system no longer responds to any action, including the magic key Ctrl-alt-DEL combination. Two pending processing systems
Select "D". One is prevention and foresight, and the other is to make up for it and debug the code after it is suspended. By inserting schedule on a policy point, the call can prevent endless loops. Schedule
The scheduler is called (as you guessed), so other processes are allowed to steal the CPU time of the process. If the process loops in the kernel space due to errors in your driver, you can track this
And then kill the process. Inserting schedule calls in the driver code brings new "problems" to the programmer: functions, and all the functions in the call track must be reentrant. In normal
In the environment, because different processes may access the device concurrently, the driver can be reentrant as a whole, but because the Linux kernel cannot be preemptible, every function does not have to be reentrant. But if the driver
The function allows the scheduler to interrupt the current process. Another different process may enter the same function. If schedule is called only during debugging, you can avoid two concurrent processes if you do not allow it.
Access the driver, so concurrency is not very important. When introducing blocking operations ("re-writing code" in Chapter 5th), we will introduce concurrency in detail. If you want to debug an endless loop, you can
Use the special keys on the Linux keyboard. By default, if you press the prscr key together with the modifier key (the key code is 70), the system prints useful information about the machine status to the current console. This function
Both x86 and Alpha systems are available. Linux also has the same function, but it uses the flag as "break/scroll
Lock key (the key code is 30 ). Each special function has a name, and a key event corresponds to it as shown below. The function name is enclosed in brackets after the key combination. Shift-
Prscr (show_memory) Prints several lines of information about memory usage, especially the usage of cache in the buffer zone. Control-prscr
(Show_state) prints a line of information for each processor in the system, and also prints the internal process tree. Mark the current process. Rightalt-prscr
(Show_registers) as it can print the Processor register content when pressing a key, it is the most important key when the system hangs. If there is a system table of the current kernel, view the command count
And how it changes over time, it is very helpful to understand where the code loops. To map these functions to different keys, each function name can be passed as a parameter to loadkeys. Keyboard
The ing table can be modified at Will (this is "policy-independent "). If lele_loglevel is sufficient, messages printed by these functions will appear on the console. If you have not run
The old klogd and a new kernel should have enough records by default. If no message appears, you can raise the record level as previously mentioned. The specific value of "high enough" is the same as the kernel version you are using.
Off. For Linux
2.0 or the latest version is 5. That is, when the system is suspended, the message is printed to the console. It is very important to confirm that the record level is high enough. Messages are generated when an interruption occurs, so even if there is a wrong Process
If you do not release the CPU, you can run "D". Of course, unless the interrupt is blocked, it is neither possible nor unfortunate. Sometimes the system seems to be suspended, but it is not. For example, if
This can happen if the keyboard is locked for some strange reason. This type of false suspension can be determined by checking the program output you run to find out the situation. I have a program that constantly updates the LED display.
On the clock, I found this is very useful for verifying that the scheduler is still running. You can check the scheduler without relying on external devices. You can implement a program to make the keyboard LEDs flash, or keep switching.
Close the floppy motor or constantly touch the speaker, but I personally think that the usual beep sound is annoying and should be avoided as much as possible. Take a look at the ioctl command kdmktone. O 'Reilly
One of the example programs (Misc-progs/heartbeat. C) on the FTP site is to keep the keyboard LEDs flashing. If the keyboard does not receive input, the best solution is
Log on to the system through the network, kill any illegal processes, or reset the keyboard (using kdb_mode
-). However, if you do not have a network to recover, you will find that the system suspension is caused by the keyboard lock. If this is the case, you should configure an alternative input device, at least
To ensure that the system is restarted normally. For your computer, shutting down or restarting the system is more convenient than pressing the "big red button". At least it can eliminate the need for fsck to scan the disk for a long time. This alternative input
The device can be a game lever or a mouse. There is a joystick on sunsite.edu.cn to restart the daemon, And the gpm-1.10 or updated mouse server can support something similar through the command line option
. If the keyboard is not locked, but the "original" mode is mistakenly entered, you can look at some tips described in The KDB package. I suggest you read these documents before the problem occurs, otherwise it will be too late.
. Another possibility is to configure the GPM-Root menu and add a "reboot" or "Reset
Keyboard menu item; GPM-root is a daemon that responds to control mouse events. It is used to display menus and execute the configured actions on the screen. Best, you can press "Pay attention to security
(SAK), a special key used to restore the system to a usable state. Because not all implementations can be used, the default keyboard table of the current Linux version does not have a special item for this key. But you can
Map a key on your keyboard to sak using loadkeys. You should check the SAK implementation in the drivers/Char directory. The comments in the Code explain why the key
Linux
2.0 is not always working, so I will not talk about it here. However, if you run version 2.1.9 or later, you can use very reliable security keys. In addition, 2.1.43 and later
The kernel of the new version also has a compilation option to choose whether to enable the "sysrq magic key". I suggest you take a look at the code in drivers/Char/sysrq. C and use this new technology. If
Your driver really suspends the system, and you don't know where to insert the schedule call. The best solution is to add some print messages, and print them to the console (by modifying
Lele_loglevel variable value ). When replaying the suspension process, it is best to install all disks on the system in read-only mode. If the disk is read-only or not installed, it will not be damaged.
The risk of file system or making it into an inconsistent state. At least you can avoid running fsck After resetting the system. Another method is to use the NFS root computer to test the module. In this case
The server manages file system consistency without being affected by your driver. You can avoid any file system crash. The last method to debug a module using the debugger is to use the debugger step by step.
Trace code to view the values of variables and machine registers. This method is time-consuming and should be avoided as much as possible. However, in some cases, fine-grained code analysis through the debugger is very beneficial. Here,
The code we call to be debugged runs in the kernel space. Unless you remotely control the kernel, it is impossible to track the kernel step by step, which makes many things more difficult. Because remote control is rarely used, we
Finally, we will introduce this technology. Fortunately, you can view and modify variables in the current kernel version. To use the debugger skillfully at this level, you must be proficient in the gdb command, have a certain understanding of the assembly code, and have the ability
The Source Code corresponds to the optimized assembly code. Unfortunately, GDB is more suitable for debugging the core than the module, and debugging modular code requires more technology. This more technology is the kdebug package,
It uses GDB's "remote debugging" interface to control the local kernel. I will introduce kdebug after introducing the common debugger. Gdbgdb is useful in exploring internal system behavior. Required to start the debugger
Assume that the kernel is an application. In addition to specifying the kernel file name, you should also provide the name of the memory image file in the command line. A typical GDB call is as follows: (CODE) the first parameter is unpressed.
The name of the compressed kernel executable file (the file is in the/usr/src/Linux directory after you compile the kernel. Only the X86 architecture has a zimage file (sometimes called
Vmlinuz), it is a kind of technique to solve the limit of only 640kb in Intel processor real-time mode; regardless of which platform, vmlinux is your compiled
Core. The second parameter of the gdb command line is the name of the memory image file. Similar to other files in/proc,/proc/kcore is generated when it is read. When the read system calls
When executed in the/proc file system, It maps to a function for data generation rather than data reading. We have introduced this function in the "use/proc file system" section. System Use
Kcore indicates the "executable file" of the kernel stored in the memory image file format. because it represents the entire kernel address space, it is a very large file that corresponds to all the physical memory. Exploitation
GDB, you can view the kernel scalar using the standard GDB command. For example, P
Jiffies can print the number of clock ticking times from the system startup to the current time. When you print data from GDB, the kernel is still running. Different data items have different values at different times. However, GDB is
Optimized access to memory image files will cache read data. If you view the jiffies variable again, you will get the same value as before. Cache variable value to prevent additional disk operations
It is right through the memory image file, but it is not very convenient for the "dynamic" memory image file. The solution is to execute core-file when you want to refresh the gdb cache.
/Proc/kcore command; the debugger will use the new memory image file and discard the old information. However, when reading new data, you do not always need to execute the core-file command; GDB uses 1 KB
READ memory image files by scale, and only cache several referenced blocks. You cannot use GDB to modify the kernel data. Because the debugger needs to run the program to be debugged before accessing the memory image, it will not repair the program.
Change the image file in the memory. When you debug the kernel image, executing the Run Command will cause segment violations after several commands are executed. For this reason,/proc/kcore does not implement the write side
Method. If you use the debug option (-g) to compile the kernel, the result-generated vmlinux is more suitable for GDB than the-G option. However, you must note that using the-G option to compile the kernel requires a large number of disks to be empty.
Between D and D support networks and a few devices and file systems, the 2.0 kernel needs to be 11 kb on the PC. However, you can generate a zimage file and use it in other systems:
When the image is started, the debugging information added due to option-G is finally removed. If I have enough disk space, I will enable the-G option. There are different methods on non-PC computers. In
On Alpha, make
Before the boot image is generated, the bootkit will drop the startup information, so that you will eventually upload two files: vmlinuxand vmlinux.gz. GDB can use the former, but you can only use the latter to start
. The kernel (at least the 2.0 kernel) won't be removed from the debugging information by default on the Linux instance. Therefore, you need to pass the information to the silo (kernel loader of the Linux instance) debug information before
. Due to size issues, neither Milo (alpha kernel loader) nor silo can start the kernel without debugging information removed. When you use the-G option to compile the kernel and use
Vmlinux and/proc/kcore use the debugger together. GDB can return a lot of information about the internal structure of the kernel. For example, you can use a command like this
* Module_list, p * module_list-next and P
* Chrdevs [4]-fops. If you have a kernel ing table and source code, these probe commands are very useful. Another GDB can execute
A useful job of a row is to use the disassemble command (which can be abbreviated) or the "check command" (x/I) command to disassemble the function. The disassemble command can be a function.
Name or memory zone range, and X/I uses a memory address as the parameter, you can also use the symbol name. For example, you can use X/20I to disassemble 20 commands. Note that you cannot disassemble a module's letter
This is because the debugger processes vmlinux and does not know the information of your module. If you try to decompile code using the module address, GDB may report that "the memory at xxxx cannot be accessed.
(Cannot access memory
XXXX )". For the same reason, you do not view the data items of the module. If you know the address of your variable, you can read its value from/dev/MEM, but it is hard to figure out from the system memory.
What is the meaning of the decomposed data. If you need to disassemble the module functions, you 'd better use the objdump tool to process your module files. Unfortunately, this tool can only process files on the disk, but not
Processes the running modules. Therefore, the addresses given in the objdump are unrelocated and are irrelevant to the running environment of the modules. As you can see, when you want to view the running status of the kernel,
GDB is a very useful tool, but it lacks some functions. The most important function is to modify the kernel items and access module functions. These gaps will be filled by the kdebug package. Use kdebug you
You can get kdebug from the PCMCIA/extras directory under the normal FTP site, but if you want to ensure the latest version is available, you 'd better go to FTP:
// Search for hyper.stanford.edu/pub/pcmcia/extras. This tool has nothing to do with PCMCIA, but the two packages are written by the same author.
. Kdebug is a tool that uses the gdb "remote debugging" interface to communicate with the kernel. First, a module is loaded to the kernel. The debugger accesses the kernel data through/dev/kdebug.
GDB regards this device as a serial port device that communicates with the debugged "application", but it is only a communication channel used to access the kernel space. Because the module itself runs in the kernel space, it can see the common Debugger
The address of the kernel space that cannot be accessed. As you have guessed, the module is a character device driver and uses the Dynamic Allocation technology of the master device number. The advantage of kdebug is that you do not need to patch or re-install it.
Compile: No modification is required for both the kernel and the debugger. All you need to do is compile and install the software package, and then call kgdb. kgdb is a tool that completes some configurations and calls GDB to access the SDK through the new interface.
The script program of the kernel part structure. However, even kdebug does not provide the single-step kernel code tracing and breakpoint setting function. This is almost inevitable, because the kernel must be running to ensure that the system
The only way to track the kernel code is to use the serial port control system from another computer. However, the implementation of kgdb allows users to modify the application to be debugged (that is, the current
Core), which can be passed to any number of Kernel Parameters and accessed by reading/writing to the memory zone of the module. The last function is to use the gdb command to add the module symbol table to the symbol table inside the debugger.
. This task is completed by kgdb. Then, when a user requests to access a symbol, GDB will know its address. The final access is completed by the kernel code in the module. Note that,
The current version of kdebug (1.6) has some problems in ing modular code addresses. You 'd better do some checks by printing some symbols and comparing them with the values in/proc/ksyms. For example
If the address does not match, you can use numeric values, but you must forcibly convert them to the correct type. The following is an example of forced type conversion: (CODE) Another advantage of kdebug over GDB
Yes, it allows you to read the latest value after the data structure is modified, without refreshing the debugger cache; GDB command set remotecache
0 can be used to disable data caching. Since kdebug is similar to GDB, I will not list the examples of using this tool here. For people who know how to use the debugger, this example
The sub-statement is simple, but it is difficult for anyone who knows nothing about the debugger. It takes time and experience to skillfully use the debugger. I am not prepared to assume the responsibility of the teacher here. All in all, kdebug
Is a very good program. Online data structure modification is a huge improvement for developers (and the easiest way to suspend the system ). There are many tools available to make your development work easier.
�D. For example, when the counter usage of the module increases during scull Development *, I can use kdebug to reset it to 0. This eliminates the need to restart the machine, log on, and start again.
And so on. The last method to remotely debug the kernel image is to use GDB's remote debugging capability. When performing remote debugging, you need two computers: one running GDB and the other running GDB.
Run the kernel you want to debug. The two computers are connected by a common serial port. As expected, control GDB must be able to understand the binary format of the kernel it controls. If the two computers have different systems
The debugger must be compiled to support the target platform. In version 2.0, the Intel version of Linux kernel does not support remote debugging, but both alpha and iSCSI versions support remote debugging. In
In Alpha, you must include support for remote debugging during compilation and pass it to the kernel command line parameter kgdb = 1 at startup or only kgdb to enable this function. On the iSCSI server,
Supports remote debugging. Kgdb = ttyx can be used to select the serial port on which to control the kernel. x Can Be A or B. If kgdb = is not used, the kernel runs normally.
. If the remote debugging function is enabled in the kernel, the system will call a special initialization function at startup, configure to be debugged by the kernel to process its own breakpoint, and jump to a self-compiled program
Breakpoint. This will pause the normal execution of the kernel and transfer the control to the breakpoint service routine. This processing function waits for the command from GDB on the serial port. When it obtains the command of GDB, it will execute the corresponding function.
Yes. With this configuration, programmers can track the kernel code in one step, set breakpoints, and complete other tasks allowed by GDB. On the control side, a copy of the target image is required (we assume it is
Linux. IMG), you also need a copy of the module you want to debug. The following command must be passed to GDB: File
The Linux. imgfile Command tells GDB which binary file needs to be debugged. Another method is to pass the image file name in the command line. This file must be in the same mode as the kernel running on the other end.
Sample. Target remote
The/dev/ttys1 command notifies GDB to use a remote computer as the target of the debugging process. /Dev/ttys1 is the local serial port used for communication. You can specify any device. For example
The kgdb script in the introduced kdebug package uses target remote/dev/kdebug. Add-symbol-File
Module. o
Address if you want to debug a module that has been loaded to the controlled kernel, you need a copy of the target file of the module in the control system. Add-symbol-file notification GDB processing mode
Block file. It is assumed that the module code is located on the address. Although remote debugging can be used to debug a module, you still need to load the module and trigger another disconnection before inserting a breakpoint on the module.
Points: the debugging module still requires a lot of skills. I personally do not use remote debugging to track the module, unless the code that runs asynchronously, such as the interrupt processing function, has a problem.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More