Linux kernel source code Analysis Method
I. View of kernel source code
The huge size of Linux kernel code makes many people "daunting". This makes people understand Linux at a general level. If you want to analyze Linux and explore the essence of the operating system, reading the kernel source code is the most effective way. We all know that to become a good programmer requires a lot of practice and code writing. Programming is important, but it is often easy for programmers to limit themselves to their own field of knowledge. To expand the breadth of our knowledge, we need to be more familiar with the code written by others, especially the code written by people with a higher level than ours. In this way, we can jump out of the limitations of our knowledge circle and enter the knowledge circle of others to learn more or even the information we generally cannot understand in the short term. The Linux kernel is carefully maintained by the "great gods" of countless open-source communities. These people can all be regarded as the top code experts. By reading the Linux kernel code, we have not only learned about the kernel, but also learned about their programming skills and computer understanding.
I am also using a project to analyze the Linux kernel source code. I have benefited a lot from the source code analysis work. In addition to getting related kernel knowledge, it also changes my previous knowledge of kernel code:
1. kernel source code analysis is not "unattainable ". The difficulty of kernel source code analysis lies not in the source code, but in how to use more appropriate methods and means for analyzing code. Because of the huge kernel size, we cannot perform step-by-step analysis from the main function as the demo program is analyzed. We need a means of intermediate intervention to "break through each kernel source code ". This "demand-as-you-go" approach allows us to grasp the main line of the source code, rather than over-tangle with specific details.
2. the kernel design is elegant. The special status of the kernel determines that the execution efficiency of the kernel must be high enough to respond to the real-time requirements of current computer applications. Therefore, the Linux kernel uses the hybrid programming of C and assembly. However, we all know that software execution efficiency and Software maintainability are in many different ways. How to Improve the maintainability of the kernel while ensuring the kernel efficiency depends on the "beautiful" design in the kernel.
3. Amazing programming skills. In the general application software design field, the status of coding may not be overly valued, because developers pay more attention to the good design of software, coding is just a matter of implementation means-just like taking an ax for firewood, you don't need to think too much. However, this is not true in the kernel. A good coding design not only improves maintainability, but also improves code performance.
Everyone's understanding of the kernel will be different. As we continue to deepen our understanding of the kernel, we will have more thoughts and experiences on its design and implementation ideas. Therefore, this article is more expected to guide more people wandering outside the Linux kernel door into the Linux World, to personally experience the magic and greatness of the kernel. I am not an expert in kernel source code. I just want to share my experiences and experiences in source code analysis and provide reference and help to those who need it, the "high-sounding" point is also a small contribution to the computer industry, especially the operating system kernel. I have talked a little about it (a lot of times, thanks ~), Next I will share my own Linix kernel source code analysis method.
Ii. How difficult is the kernel source code?
Essentially, analyzing the Linux kernel code is no different from looking at other people's code, because it is generally not your own code. Let's give you a simple example. A stranger gives you a program at will, and asks you to explain the functional design of the program after reading the source code, I think a lot of people who feel that their programming skills are good will certainly feel that there is nothing to do with it. As long as I patiently read his code from start to end, I will certainly find the answer, and this is indeed the case. Now let's make another assumption. If this person is Linus and you are given the code of a module in the Linux kernel, do you still think it is so easy? Many may hesitate. It's also a stranger (if Linus knows you, of course not, haha ~) Why do we feel different about your code? I think there are the following reasons:
1. the Linux kernel code seems somewhat mysterious to the outside world, and it is very large. It may not be able to start when it is suddenly displayed. For example, it may come from a very small reason-the main function cannot be found. For simple demo programs, we can analyze the meaning of the code from the beginning to the end, but the kernel code analysis is completely ineffective, because no one can read the Linux code from the beginning to the end (because it is not necessary, you can use it now ).
2. Many people also have access to the code of large software, but most of them are application-oriented projects. The form and meaning of the Code are related to the business logic that they often access. Unlike kernel code, most of the information it processes is closely related to the underlying computer. For example, the lack of relevant knowledge such as the operating system, compiler, assembly, and architecture will also cause many obstacles to reading the kernel code.
3. the kernel code analysis method is not reasonable. In the face of a large number of complex Kernel code, if you do not start from a global perspective, it is easy to fall into the details of the Code. Although the kernel code is huge, it also has its design principles and architecture. Otherwise it would be a nightmare for anyone to maintain it! If we clarify the overall design concept of the Code module and analyze the code implementation, it may be easy to analyze the source code.
I personally understand these problems. If you have never been in touch with large software projects, analyzing Linux kernel code may be a great opportunity to accumulate experience in large projects (indeed, Linux code is the biggest project I have ever encountered !). If you do not have a thorough understanding of the bottom layer of the computer, you can choose to analyze and learn to accumulate the underlying knowledge. The process of code analysis may be a little slow at the beginning, but with the accumulation of knowledge, we will gradually become clear about the "business logic" of the Linux kernel. Finally, I want to share with you how to grasp the source code from a global perspective.
Iii. kernel source code Analysis Method
Step 1: Collect data
From the perspective of understanding new things, before exploring the essence of things, we must have a process of understanding new things. This process is a preliminary concept of new things. For example, if we want to learn the piano, we need to first understand the basics of playing the piano. We need to learn basic music, music, five-line music, and other basic knowledge, and then learn the techniques and fingering of playing the piano, in the end, you can start practicing the piano.
The same is true for kernel code analysis. First, we need to locate the content involved in the code to be analyzed. It is the code for Process Synchronization and scheduling, the code for memory management, the code for device management, or the code for system startup. The size of the kernel determines that we cannot analyze all the kernel code at a time. Therefore, we need to give ourselves a reasonable division of labor. As the algorithm design tells us, to solve a major problem, we must first solve its subproblems.
After locating the scope of the Code to be analyzed, we can use all the resources at hand to fully understand the overall structure and functions of the Code.
All the resources mentioned here refer to Baidu, Google's large network search engines, operating system principles teaching materials, Professional Books, and experience and materials provided by others, even the documentation, comments, and source code identifiers provided by Linux source code (do not underestimate the names of identifiers in the code, sometimes they provide key information ). All the resources here refer to all the available resources you can think. Of course, we are unlikely to collect all the information we want through this form of information. We just want to be as comprehensive as possible. The more comprehensive information is collected, the more information can be used in the code analysis process, and the less difficult the analysis process.
Here is a simple example. Suppose we want to analyze the Code implemented by the Linux Variable Frequency mechanism. So far, we only know this term. With its literal meaning, we can roughly guess that it should be related to CPU frequency adjustment. We should be able to obtain the following information through information collection:
1. CPUFreq mechanism.
2. performance, powersave, userspace, ondemand, and conservative FM policies.
3./driver/cpufreq /.
4./ention/cpufreq.
5. P state and C state.
......
If the Linux kernel code can be analyzed to collect such information, it should be said that it is "lucky. After all, the Linux kernel information is indeed inferior. NET and JQuery are so rich, but compared to a decade ago, there was no powerful search engine, and the period without relevant research materials should be called a "great harvest" era! Through simple "Search" (it may take one or two days), we even found the source code file directory where the code is located, I have to say that this information is "worth the money "!
Step 2: locate source code
From the data collection, we are lucky to find the source code directory related to the source code. However, this does not mean that we are indeed analyzing the source code under this directory. Sometimes the directories we find may be scattered, and sometimes the directories we find contain a lot of code related to specific machines. What we are more concerned with is the main mechanism of code to be analyzed, instead of machine-related special code (this helps us better understand the nature of the kernel ). Therefore, we need to carefully select the materials that involve code files. Of course, this step is unlikely to be completed at one time, and no one can guarantee that all source code files to be analyzed can be selected at one time without any leakage. But we don't have to worry about it. As long as we can grasp the core source files related to most modules and analyze the code in the future, we will naturally find them all.
Back to the above example, we carefully read the instructions in/ention/cpufreq. Currently, the Linux source code stores the module-related documentation in the encryption ention folder of the source code directory. If the module to be analyzed does not have the documentation, this will increase the difficulty of locating key source code files, but will not cause us to fail to find the source code to be analyzed. By reading the documentation, we can at least focus on the source file/driver/cpufreq. c. Based on the source file documentation and the previously obtained FM policies, we can easily focus on the five source files: cpufreq_performance.c, cpufreq_powersave.c, cpufreq_userspace.c, cpufreq_ondemand, and quota. Have all the involved files been found? Don't worry. You can find other source files sooner or later by analyzing them. If sourceinsight is used in windows to read the kernel source code, we can use functions such as function calling and searching for symbol reference, combined with code analysis, you can easily find other files freq_table.c, cpufreq_stats.c, And/include/linux/cpufreq. h.
Based on the information flow direction, we can locate the source code file to be analyzed. Source code locating is not critical because we do not need to find all source code files. We can postpone some work to the code analysis process. Source code locating is also critical. Finding some source code files is the basis for source code analysis.
Step 3: simple comments
In the located source code file, analyze the general meaning and functions of each variable, Macro, function, struct, and other code elements. The reason why this is called a simple annotation is not that the annotation work in this part is very simple, but that this part of annotation does not need to be overly detailed, as long as it roughly describes the meaning of the relevant code elements. On the contrary, the work here is actually the most difficult step in the entire analysis process. This is the first time that we go deep into the kernel code. Especially for those who analyze the kernel source code for the first time, a large number of unfamiliar gnu c syntaxes and extensive macro definitions are desperate. At this time, as long as you sink your mind and find out every key difficulty, you can ensure that similar difficulties will not be trapped in the future. In addition, other kernel-related knowledge is constantly extended like a tree.
For example, the usage of the "DEFINE_PER_CPU" macro will appear at the beginning of the cpufreq. c file. We can find out the meaning and functions of the macro by checking the information. The methods used here are basically the same as those used to collect data. In addition, we can also use the conversion to definition function provided by sourceinsight to view its definition, or you can use LKML (Linux Kernel Mail List) for more information. We can still ask www.stackoverflow.com for answers (What are LKML and stackoverflow? Collect information !). In short, with all possible means, we can always get the meaning of this macro-define an independent variable for each CPU.
We also do not need to make the description of comments accurate Once (we do not even need to find out the specific implementation process of each function, as long as we understand the general meaning of the function ), we combine the collected information and the analysis of the code behind it to constantly improve the meaning of the annotations (the original annotations and identifiers in the source code are very useful here ). The meaning of the comment is constantly modified by constantly commenting on the materials.
After a simple annotation of all the involved source code files, we can achieve the following results:
1. The meaning of the code elements in the source code is basically clarified.
2. Found out all the key source code files involved in this module.
Based on the information and materials we have collected, we can compare the analysis results and materials to determine and correct our understanding of the Code. In this way, through simple comments, we can grasp the main structure of the source code module as a whole. This achieves the basic purpose of simple annotations.
Step 4: detailed comment
After the simple comments of the code, we can think that the analysis of the module is half done, and the rest is the in-depth analysis and thorough understanding of the Code. Simple annotations cannot accurately describe the specific meaning of code elements, so it is necessary to describe in detail. In this step, we need to clarify the following:
1. variable definition when to use.
2. When macro-defined code is used.
3. Meanings of function parameters and return values.
4. the execution process and call relationship of the function.
5. The specific meaning and conditions of struct fields.
We can even call this step a detailed function annotation, because the meanings of code elements outside the function are basically clear in simple annotations. The Execution Process and algorithm of the function itself are the main tasks of commenting and analyzing this part.
For example, how is the implementation algorithm of the cpufreq_ondemand policy (in the function dbs_check_cpu) implemented. We need to analyze the variables used by the function and the called function and find out the ins and outs of the algorithm. The best result is that we need the execution flowchart of these complex functions and the function call relationship diagram. This is the most intuitive expression.
Through the annotations in this step, we can fully grasp the overall implementation mechanism of the Code to be analyzed. All the analysis work can be considered to have completed 80%. This step is particularly critical. We must make the annotated information accurate enough to better understand the division of the internal modules of the Code to be analyzed. Although the macro Syntax "module_init" and "module_exit" are used in the Linux kernel to declare module files, the division of sub-functions within the module is based on a full understanding of the functions of the module. Only by dividing the modules correctly can we find out which external functions and variables the module provides (using the symbols exported by EXPORT_SYMBOL_GPL or EXPORT_SYMBOL ). In order to continue the next module identifier dependency analysis.
Step 5: module identifier dependency
By dividing the code modules in Step 4, we can easily analyze the modules one by one. Generally, we can start with the module entrance function at the bottom of the file (the functions declared by "module_init" and "module_exit" are generally at the end of the file ), based on the functions they call (defined by yourself or functions of other modules) and key variables used (global variables in this file or external variables of other modules) draw the "function-variable-function" dependency graph, which is called the identifier dependency graph.
Of course, the dependency between identifiers in a module is not simply a tree structure. In many cases, it is a complex network relationship. At this time, our detailed comments on the Code are embodied. Based on the meaning of the function, we divide the module into sub-functions and extract the dependent tree of each sub-function identifier.
Through the identifier dependency analysis, we can clearly display the variables used by the functions defined by the module to call those functions, and the dependencies between the module sub-functions-which functions and variables are shared.
Step 6: Inter-module dependency
Once the dependency graph of all the internal identifiers of a module is organized, dependencies between modules can be easily obtained based on the variables or functions of other modules used by the module.
The module dependency of cpufreq code can be expressed as follows.
Step 7: module Architecture
Through the dependency relationship between modules, we can clearly express the position and function of the module in the code to be analyzed. Based on this, we can classify modules and sort out the code architecture relationships.
As shown in the module dependency diagram of cpufreq, we can clearly see that all the FM policy modules depend on the core modules cpufreq, cpufreq_stats, and freq_table. If we abstract the depended three modules as the core framework of code, these FM policy modules are built on this framework and are responsible for interaction with the user layer. The core module cpufreq provides drivers and other related interfaces to interact with the underlying system. Therefore, we can obtain the following module architecture diagram.
Of course, the structural diagram is not an inorganic mosaic of modules. We also need to enrich the meaning of the structural diagram with the materials we have consulted. Therefore, the details of the Architecture diagram here may be different with the understanding of different people. However, the main body of the structural diagram has the same meanings. So far, we have completed all the analysis of the kernel code to be analyzed.
Iv. Summary
As the article said at the beginning, we cannot analyze all the kernel code. Therefore, it is an effective way to understand the essence of the kernel by collecting information about the analyzed code and then analyzing the original beginning and end of the Code according to the above process. This method analyzes the kernel code according to the specific needs, which makes it possible to quickly enter the Linux kernel world. In this way, we constantly analyze other modules in the kernel, and finally get a comprehensive understanding of the Linux kernel, which achieves our goal of learning the Linux kernel.
Finally, we recommend two reference books for kernel learning. One is the design and implementation of the Linux kernel, which provides readers with a brief introduction to the main functions and implementations of the Linux kernel. But it will not bring readers into the abyss of Linux kernel code. It is a good reference book for understanding the kernel architecture and getting started with Linux kernel code. At the same time, it will increase the reader's interest in kernel code. The other is "Understanding the Linux kernel in depth". I don't have to say much about the classics of this book. I just suggest that you better read this book with the kernel code. Because this book describes the kernel code in great detail, reading with the code can help us better understand the kernel code. At the same time, in the process of analyzing the kernel code, you can also find reference materials in this book. Finally, we hope you will enter the kernel world as soon as possible to experience the surprises brought by Linux!
Related:
Linux kernel design and implementation (original book version 3rd) Clear Chinese PDF download
For more information about how to download the Linux kernel (third edition) (English version + Chinese Version), see
This article permanently updates the link address: