Linux Kernel source Code analysis method

Source: Internet
Author: User

One, the kernel source code of my opinion

The sheer size of the Linux kernel makes a lot of people "daunting", and because of this, it makes people's knowledge of Linux only at a general level. If you want to dialysis Linux, deep into the nature of the operating system, read the kernel source code is the most effective way. We all know that to be a good programmer requires a lot of practice and code writing. Programming is important, but it's easy for people who are programmed to confine themselves to their field of knowledge. If you want to expand the breadth of your knowledge, we need to reach out to code written by others, especially those who are taller than us. In this way, we can jump out of our own knowledge circle, into the knowledge circle of others, to learn more and even we generally do not know the information in the short term. The Linux kernel is meticulously maintained by the "great gods" of countless open-source communities, who can be called a top-of-the-line code player. By reading the Linux kernel Code, what we learn is not only kernel-related knowledge, it seems to me more valuable is to learn and understand their programming skills and understanding of the computer.

I also through a project to contact the Linux kernel source analysis, from the source of analysis work, I benefited a lot. In addition to acquiring the relevant kernel knowledge, I also changed my past knowledge of kernel code:

1. Kernel source analysis is not "unattainable". The difficulty of kernel source analysis is not in the source code itself, but in how to use a more appropriate way to analyze the code and means. The size of the kernel causes us not to follow the analysis of the General demo program from the main function, we need a way to intervene from the middle of the kernel source code "conquer". This "On demand" approach allows us to grasp the main line of the source code, rather than overly tangled in specific details.

2. The design of the kernel is graceful. The particularity of the kernel determines that the kernel's execution efficiency must be sufficiently well paid to respond to the real-time requirements of the current computer application, and the Linux kernel uses mixed programming in C and assembler. But we all know that the efficiency of software execution and the maintainability of software are in many cases counter-productive. How to improve the maintainability of the kernel under the premise that the kernel is efficient depends on the "graceful" design of the kernel.

3. Magical programming skills. In the field of general application software design, the position of coding may not be unduly valued, as developers pay more attention to the good design of the software, and the coding is only a matter of means of implementation--as with the axe chopping wood, without much thought. But this is not true in the kernel, the good coding design brings not only the improvement of maintainability, even the improvement of code performance.

Each person's understanding of the kernel will be different, as our understanding of the kernel continues to deepen, the design and implementation of the idea will have more thinking and experience. Therefore, this article is more expected to guide more people wandering outside the Linux kernel door into the world of Linux , to personally experience the magic and greatness of the kernel. And I am not the core source of experts, so I just want to share my own analysis of the source of experience and ideas, for those who need to provide reference and help, said the "sounding" point, but also for the computer industry, especially in the operating system core contribution to a humble. Gossip less (already wordy a lot of, embarrassed ~), below I will share my own Linix kernel source code analysis method.

Second, the kernel source code is not difficult?

In essence, the analysis of linux kernel code is no different from looking at someone else's code, because it is not the code you write yourself in front of you. We first give a simple example, a stranger gave you a program, and want you to read the source after the design of the program, I think a lot of self-feeling programming ability can certainly feel that this is nothing, as long as I patiently put his code from start to finish, I can certainly find the answer, and the fact is true. So now for a hypothesis, if this person is linus , give you linux The code of a module of the kernel, would you still feel so easy? Many people may hesitate. The same is the Stranger (linus if you know your words of course not count, hehe ~ ) give you the code, why do we feel different? I think there are the following reasons:

1. Linux Kernel code in the "Outside" seems a little bit mysterious, and it is very large, jerked in front may feel unable to start. For example, it may come from a very small reason-the main function cannot be found. For the Simple demo program, we can analyze the meaning of the code from beginning to end, but the analysis of kernel code is completely ineffective, because no one can look at the Linux code from start to finish (because there is no need to use the time to see it).

2. Many people also touch the code of large-scale software, but most belong to the application project, the form and meaning of the Code are related to the business logic that they often contact. Unlike kernel code, the information it handles is mostly tied to the bottom of the computer. The lack of knowledge about operating systems, compilers, compilations, architectures, and so on, can also make reading kernel code a hindrance.

3. The method of parsing kernel code is not reasonable. Faced with a large number of complex kernel code, if you do not start from a global perspective, it is easy to fall into the mire of code details. Kernel code is huge, but it also has its design principles and architecture, otherwise it is a nightmare for anyone to maintain it! If we clarify the overall design of the code module, and then to analyze the implementation of the Code, it may be easy to analyze the source is a happy thing.

In response to these questions, I personally understand this. If you are not in touch with a large software project, it is possible to analyze Linux kernel code as a good opportunity to accumulate large project experience (indeed,Linux code is the biggest project I have ever come across!) )。 If you don't know enough about the bottom of the computer, we can choose to learn from the side of the analysis to accumulate the underlying knowledge. It may be a little slow to start analyzing code, but as knowledge accumulates, our "business logic" of the Linux kernel becomes clearer. Finally, how to grasp the source of the analysis from a global perspective, this is what I want to share with you experience.

Third, the kernel source code analysis method

First Step: Data collection

From the point of view of new things, before exploring the essence of things, there must be a process of understanding the fresh things, and this process is that we have a preliminary concept of new things. For example, we want to learn the piano, then we need to understand that playing the piano requires us to learn basic music theory, simplified, staff and other basic knowledge, and then learn the piano playing skills and fingering, finally can really start practicing the piano.

The same is true of parsing kernel code, first of all we need to locate the content of the code to be analyzed. is the process synchronization and scheduling code, is the memory management code, or device management code, or system startup code, and so on. The sheer size of the kernel determines that we cannot complete the analysis of the kernel code at once, so we need to give ourselves a reasonable division of labor. As the algorithm design tells us, to solve a big problem, we must first solve the sub-problems involved.

By positioning the range of code to be analyzed, we can use all the resources at hand to fully understand the overall structure and approximate functionality of the code as well as possible.

All of the resources mentioned here refer to whether it is Baidu,Google 's large web search engine, the operating system principles of textbooks and professional books, or other people provide experience and information, even the Linux source documents, The names of comments and source identifiers (do not underestimate the name of identifiers in code, and sometimes they provide critical information). In short, all the resources here refer to everything you can think of available resources. Of course, we are not likely to get all the information we want through this form of information collection, we only ask for as comprehensive as possible. Because the more information is collected, the more information is used in the process of analyzing the code, and the less difficult it is to analyze the process.

Here is a simple example, assuming that we want to analyze the code implemented by the variable frequency mechanism of Linux . So far we just know this noun, and by literal means we can guess roughly that it should be related to CPU frequency regulation. Through the information collection, we should be able to get the following information:

1. CPUFreq mechanism.

2. performance,powersave,userspace,OnDemand,Conservative FM strategy.

3. /driver/cpufreq/.

4. /documention/cpufreq.

5. P State and Cstate.

......

Analysis of Linux kernel code If you can gather this information, it should be said to be very "lucky". After all, the information about the Linux kernel is not as rich as . NET and JQuery , but compared to a decade ago, no powerful search engine, no relevant research materials period should be called "Harvest" era! We have a simple "search" (which may take a day or two), or even find this part of the code is located in the source file directory, we have to say that such information is simply "priceless"!

Step Two: Source location

From the data collection, we "fortunately" found the source code-related source directory. But this does not mean that we are really analyzing the source code in this directory. Sometimes we find directories that may be scattered, and sometimes we find a lot of specific machine-related code in the directory, and we are more concerned with the main mechanism of the code to be analyzed, rather than the machine-related special code (which helps us understand the nature of the kernel). Therefore, we need to carefully select the information that is involved in the code file. Of course, this step is not likely to be completed one time, no one can guarantee to choose the source files to be analyzed and one does not leak. But we do not have to worry, as long as we can capture most of the module-related core source files, through the subsequent analysis of the code, it is natural to find them all.

Back to the example above, we read carefully/documention/cpufreqDescription of the document below. The currentLinuxThe source code will be the module related to the document description stored in the source directorydocumentionfolder, if the module to be analyzed does not have a document description, this will increase the difficulty of locating the key source files, but will not cause us to find the source we want to analyze. By reading the documentation, we can at least focus on/driver/cpufreq/cpufreq.cThis source file. With this document description of the source file, combined with the previously collected FM strategy, it is easy to focus oncpufreq_performance.c、cpufreq_powersave.c、cpufreq_userspace.c、Cpufreq_ondemand、cpufreq_conservative.cThese five source files. Have you finished all the documents involved? Don't worry, start analyzing them, and sooner or later you'll find other source files. If theWindowsUnder useSourceinsightRead the kernel source code, we through the function of the call and find Symbol references and other functions, combined with the analysis can easily find another filefreq_table.c、CPUFREQ_STATS.CAnd/include/linux/cpufreq.h。

According to the direction of the information flow, we can locate the source files that need to be analyzed. Source location This step is not very critical, because we do not need to find out all the source files, we can postpone some of the work to the process of analyzing the code. Source location is also more critical to find a part of the source code file is the basis for analysis of source code.

Step Three: Simple comments

The

analyzes the approximate meanings and functions of each variable, macro, function, struct, and other code elements in a well-positioned source file. This is referred to as a simple comment, does not mean that this part of the annotation work is very simple, but that this part of the comments can not be excessively refined, as long as the approximate meaning of the relevant code elements can be described. Instead, the work here is the most difficult step in the entire analysis process. Because this is the first time deep inside the kernel code, especially for the first time to analyze the kernel source, a lot of unfamiliar gnu Font-family: ' Times New Roman '; >c syntax and the overwhelming macro definition can be very desperate. At this point, as long as the sinking of the heart, to understand each of the key difficulties, can be guaranteed to encounter similar difficulties will not be trapped again. What's more, our kernel-related knowledge will continue to grow like a tree.

For example, at the beginning of the cpufreq.c file will appear "define_per_cpu" Macro use, we can check the data to understand the meaning and function of the macro. The means used here are basically consistent with the method used to collect the data, and we can use the go-to definition function provided by Sourceinsight to view its definition, or use lkml(Linux Kernel Mail List) Check it out. We can also ask for answers to www.stackoverflow.com questions (want to know what is lkml and stackoverflow?). Collect the information! )。 In short, with all possible means, we can always get the meaning of this macro-defining a separate variable for each CPU .

We also do not have to force a note to describe the annotation is very accurate (we do not even need to understand the specific implementation process of each function, as long as the understanding of the general function of the meaning can be), we combine the collected data and the analysis of the code behind the continuous improvement of the meaning of the comments (the source code comments and identifiers in this very useful Through constant comments, constant access to information, constantly modify the meaning of the note.

When we have a simple comment on all the source files involved, we can achieve the following effects:

1. Basically understand the source code elements of the existence of the meaning.

2. Find out all the key source files involved in the module.

Combining the information and information previously collected on the overall or architectural description of the code to be analyzed, we can compare the results of the analysis with the data to identify and correct our understanding of the code. In this way, through the simple comment, we can grasp the main structure of the source code module as a whole. This also achieves the basic purpose of our simple annotations.

Fourth Step: Detailed comments

After you have completed a simple comment on the code, you can assume that the analysis of the module is half done, and that the rest is an in-depth analysis and thorough understanding of the code. Simple annotations are not always precise enough to describe the exact meaning of code elements, so detailed comments are necessary. In this step, we need to understand the following:

1. When the variable definition is used.

2. When the code defined by the macro is used.

3. The function's parameters and the meaning of the return value.

4. The execution flow and invocation relationship of the function.

5. The specific meaning and usage conditions of the struct field.

We can even refer to this step as the function detail comment, because the meaning of the code element outside the function is basically clear in the simple comment. The execution flow and algorithm of the function itself are the main tasks of this part of annotation and analysis.

For example, how the cpufreq_ondemand Strategy Implementation algorithm (function DBS_CHECK_CPU ) is implemented. We need to gradually analyze the variables used by the function and the function called functions to clarify the ins and outs of the algorithm. Best of all, we need the execution flowchart of these complex functions and the function call graph, which is the most intuitive way to express.

by commenting on this step, we can basically fully grasp the implementation mechanism of the code to be analyzed. And all the analytical work can be considered complete 80%. This step is particularly critical, and we must try to make the information in the annotations accurate enough to better understand the partitioning of the internal modules of the code to be analyzed. Although the Linux kernel uses the macro syntax "module_init" and "module_exit" To declare module files, However, the division of sub-functions within the module is based on the full understanding of the function of the module. Only by properly dividing the modules can we find out which external functions and variables are provided by the module (using EXPORT_SYMBOL_GPL or export_symbol The exported symbol). In order to continue the next step in the module identifier dependency analysis.

Fifth Step: module Internal identifier dependency

Through the fourth step of the code module Division, we can be very "easy" to analyze the module on a one-to-one basis. In general, we can start with the module entry function at the bottom of the file ("module_init" and "module_exit" Declarations of functions, usually at the end of the file), Draw "function - variable - function" dependency graphs based on the functions they call (functions defined by themselves or other modules) and the key variables used (global variables in this file or external variables of other modules)-we call identifier dependency graphs.

Of course, intra-module identifier dependencies are not purely tree-shaped structures, and many are complex network relationships. At this point, our detailed comment on the code is reflected in the role. We divide the module into sub-function according to the meaning of the function itself, extracting the identifier dependent tree of each sub-function.

With the identifier dependency analysis, it is clear that the module-defined functions call those functions, what variables are used, and the dependencies between module sub-functions-which functions and variables are common.

Sixth step: Inter-module dependency relationship

Once all the module internal identifier dependency graphs have been collated, the dependencies between modules can be easily obtained according to the variables or functions of other modules used by the module.

Cpufreq The module dependencies of the code can be expressed as the following relationship.

Seventh Step: module Architecture diagram

Through the dependency graph between modules, it is clear to express the position and function of the module in the entire code to be analyzed. Based on this, we can classify the modules and sort out the architectural relationships of the code.

as Cpufreq Module Dependency diagram, we can clearly see that all FM policy modules are dependent on the core module Cpufreq , Cpufreq_stats and the freq_table the. If we abstract the three modules that are relied upon as the core framework of the code, these FM policy modules are built on this framework, and they are responsible for interacting with the user layer. The core module cpufreq provides drivers and other related interfaces responsible for interacting with the underlying system. Therefore, we can get the following module architecture diagram .

Of course, the frame composition is not the inorganic splicing of modules, we also need to combine the data to enrich the meaning of the architecture diagram. As a result, the details of the architecture diagram here will deviate from the understanding of different people. But the meaning of the body of the architecture diagram is basically the same. At this point, we have completed all the parsing work for the kernel code to be analyzed.

Iv. Summary

As the article began, it is impossible to analyze all of the kernel code. Therefore, through the analysis of the code to collect information, and then according to the above process analysis of the original code is an effective way to understand the essence of the kernel. This approach to analyzing kernel code as needed provides the possibility of fast access to the world of Linux Kernels. In this way, the analysis of other modules of the kernel, the final synthesis of their own understanding of the Linux kernel, we have achieved the purpose of learning the Linux kernel.

Finally, we recommend two reference books to learn the kernel. One is the design and implementation of theLinux kernel, which provides readers with a quick and concise introduction to the main functions and implementations of the Linux kernel. But not bringing readers into the abyss of Linux kernel code is a very good reference for understanding the kernel architecture and getting started with Linux kernel code, and it will increase the reader's interest in kernel code. Another is the "deep understanding of the Linux kernel," the book's classic I do not have to say more. I'm just suggesting that if you want to learn a better book, it's best to read it together with the kernel code. Since this book is very detailed in the description of the kernel code, reading in conjunction with the code can help us understand the kernel code better. At the same time, in the process of analyzing kernel code, it can also be found in this book has reference value of the material. Finally, we would like you to enter the core of the world, experience the Linux bring us the surprise!

Original Address: http://www.cnblogs.com/fanzhidongyzby/archive/2013/03/20/2970624.html

Linux Kernel source Code analysis method

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.