Linux kernel design and implementation a book of reading and finishing the 18th chapter

Source: Internet
Author: User
Tags repetition system log using git

CHAPTER 18 Commissioning 18.1 ready to start

What needs to be prepared is:-a bug-a kernel version that hides bugs-knowledge and luck about kernel code

The point: if you want to successfully debug, it depends on whether you can make these errors reappear. If not, the elimination of bugs can only be done by abstracting the problem and then looking for clues from the code.

18.2 bugs in the kernel
    1. Possible symptoms when a bug occurs:

      • The error code. (If the correct value is not stored in the appropriate location)
      • The error that occurred while synchronizing. (If the shared variable is not locked properly)
      • The wrong management hardware. (such as sending the wrong command to the wrong control register)
      • ......
    2. Possible symptoms of a kernel bug attack:

      • Reduce running performance of all Programs
      • Destroying data
      • Causes the system to be in a deadlock state
      - .......

      Overview:

    3. Bugs in the kernel do not behave as clearly in user-level programs-because the interaction between the kernel, the user, and the hardware is subtle

    4. From the errors hidden in the source code to the bugs that show up in front of the witnesses, it's often the event that goes through a chain of events that can trigger.
18.3 Debug 18.3.1 robustness with Print
    1. Robustness is a trait that the PRINTK () function is most likely to accept, and can be called at any time, anywhere.

      • Called in the context of the interrupt and the context of the process
      • Called when any lock is held
      • is called at the same time on multiple processors and does not have to use locks.
    2. PRINTK () function variant--EARLY-PRINTK () function, except that it can work earlier (even before the beginning of startup, before the terminal has been initialized)

    3. Very good elasticity.
18.3.2 Log Level
    • The main difference between PRINTK () and printf (): The former can specify a log level at which the kernel determines whether to print a message. The kernel displays all messages with a lower level than a specific value on the terminal.
    • The default record level for the terminal is kern_warning
    • The kernel Kernemerg the most important record level to <0>; defines the insignificant record level kerndebug as <7>
    • 0 KERNEmerg most important ... 7 KERNDEBUG is the least important
    • For debugging information, there are two ways to give a record level:
      • Keep the default record level of the terminal unchanged, giving all debug information a kern_crit or lower rating.
      • Kern_debug level of all debug information, adjust the terminal's default record level.
18.3.3 Record Buffer
    1. Kernel messages are recorded in the ring queue, read and write in a queue, size can be adjusted by setting Configlogbuf_shift
    2. On a single processor, the buffer size defaults to 16KB, which means that messages that exceed the old message will be overwritten
    3. Advantage:

      • Read and write synchronization problems are easy to solve
      • Record maintenance is more convenient
    4. Cons: Information may be lost.

18.3.4 syslogd and Klogd # # #

This is the daemon for two user spaces, klogd the kernel messages from the record buffer, and then saves them in the system log file through the syslogd daemon. -KLOGD

- 既可以从/proc/kmsg文件中,也可以通过syslog()系统调用读取这些消息。- 默认是/proc方式。- 两种情况klogd都会阻塞,知道有新的内核消息可供读出,唤醒之后默认处理是将消息传给syslogd。- 可以通过-c标志来改变终端的记录等级
    • Syslogd
      • Add all the messages it receives to a file, which is/var/log/messages by default.
18.4 oops
    • oops is the most common way that the kernel informs the user of unfortunate occurrences (because the kernel is the manager of the entire system, cannot kill itself, and it is difficult to fix itself)
    • • Normally, the kernel will be in an unstable state after sending the oops, and if oops occurs while other processes (except the No. 0 idle and 1th init processes) are running, the kernel will kill the processes and attempt to continue
    • The process includes:

      • Output error messages to the terminal
      • Information saved in the output register
      • Output traceable traces that can be traced
    • About the timing of the occurrence of oops:

1. Occurs in the interrupt context: The kernel cannot continue, will fall into chaos, causing the system to panic

2. Occurs in the idle process or the INIT process (process No. 0 and process 1th), ibid.

3. When another process is running, the kernel kills the process and attempts to continue

    • Possible causes of oops occurrence:

      Memory access is out of bounds

      Illegal instructions.

      ......

    • important information contained in OOPS: Register context and backtracking clues: Shows the chain of function calls that caused the error to occur.

      Register context information is also useful, such as helping to burst into the scene that caused the problem

18.4.1 Ksymoops

• Convert addresses in backtracking clues to meaningful symbolic names:

           •    ksymoops saved_oops.txt
18.4.2 kallsyms

• Enabled by defining the config_kallsyms configuration option, which holds the symbolic name of the corresponding function address in the kernel image, the kernel can print decoded trace clues

18.5 Kernel Debug Configuration options
    • The Kernel Development menu item, located in the kernel configuration Editor, is dependent on the configDEBUGKERNEL.

      Slab Layer Debugging Slab Debug Options high-memory Debugging High-end memory debug options I/O mapping debugging I/O mapping debug options spin-lock Debugging spin lock debug Options Stack-overflow Debugging Stack Overflow check option sleep-inside-spinlock checking spin lock internal sleep option ...

-Atomic manipulation: refers to something that can be executed without separating it, and cannot be interrupted at execution time.
 例如;正在使用一个自旋锁或禁止抢占的代码。      使用锁时睡眠是引发死锁的元凶。
18.6 Raising a bug and printing information
    1. Take advantage of Bug () and bug_on () (because most architectures define these two functions as some kind of illegal operation that can trigger oops)

      • As an assertion or conditional statement
    2. Calling the panic () function suspends the system while printing the error message

      • Panic ("terrible Thing", terrible_thing)
    3. Call Dump_stack () to print the register context and the trace thread of the function only on the terminal

18.7 Magical System Request keys
    • This feature can be enabled by defining the configMAGICsysrq configuration option. The SYSRQ (System request) key is the standard key on most keyboards.
    • When this feature is enabled, the kernel can communicate through special key combinations regardless of the state.
    • In addition to the configuration options, a sysctl is used to mark the open or close of the feature, starting with the following command:

       echo 1 > /proc/sys/kernel/sysrq
    • Several commands for SYSRQ:

       SysRq-s:将“脏”缓冲区跟硬盘交换分区同步 SysRq-u:卸载所有的文件系统 SysRq-b:重启设备
18.8 inside and debug the legendary 18.8.1 gdb-boot kernel debugger
    gdb vmlinux(未经压缩的内核映像)-
    • You can use all of GDB's commands to get information. For example:

           打印一个变量的值:         p global_variable     反汇编一个函数:         disassemble function        -g参数还可以提供更多的信息。
    • Limitations:

      No way to modify kernel data

      Cannot step into kernel code

18.8.2 Kgdb
    • is a patch that allows us to debug the kernel with all of GDB's capabilities on a remote host via the serial port.
    • Requires two computers: deportment runs the kernel with kgdb patches, and the second child uses GDB to debug the first one via a serial line.
-can be used with all functions of kgdb,gdb:
     - 读取和修改变量值     - 设置断点     - 设置关注变量     -  
18.9 detection system 18.9.1 using UID as selection criteria
    1. if (current->uid! = 7777)
    2. {
    3. / old algorithm /
    4. Else
    5. {
    6. / New Algorithm /
    7. }
18.9.2 Using condition variables
    • You can use conditional variables if your code is not process-independent, or if you want to have a mechanism that can be used for all situations to control an attribute.
    • This is easier than using UID, just to create a global variable as a conditional selector switch:

        如果该变量为0,就使用某一个分支上的代码;  否则,选择另外一个分支。
    • Control mode: Some kind of interface, or debugger.
18.9.3 usage Statistics
    • This method is often used when the user needs to master the regularity of a particular event.
    • method is to create statistics and provide some mechanism to access their statistical results.

Defining Global Variables

        在/proc目录中创建一个文件       or新建一个系统调用       or通过调试器直接访问(最直接)
18.9.4. Repetition frequency limit

(1) Repetition frequency limit (2) Number of occurrences limit

18.10 finding the change that caused the crime by using a binary search method 18.11 using Git for a two-part summary

This chapter mainly discusses the kernel debugging, debugging process is to seek to achieve and target deviation behavior, through learning a variety of debugging skills, found difficulties, but I believe there will be a day of efforts to have results.

Resources

The third edition of the original book "Linux Kernel Design and implementation"

Linux kernel design and implementation a book of reading and finishing the 18th chapter

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.