Linux kernel source code Analysis Method

Source: Internet
Author: User

I. View of kernel source code

LinuxKernelCodeThis is why many people are "daunting ".LinuxOnly in a general level. If you want dialysisLinuxIn-depth understanding the nature of the operating system, reading the kernel source code is the most effective way. We all know that we want to be excellent.ProgramRequires a lot of practice and code writing. Programming is important, but it is often easy for programmers to limit themselves to their own field of knowledge. To expand the breadth of our knowledge, we need to be more familiar with the code written by others, especially the code written by people with a higher level than ours. In this way, we can jump out of the limitations of our knowledge circle and enter the knowledge circle of others to learn more or even the information we generally cannot understand in the short term.LinuxThe kernel is composed of countless open-sourceCommunityThe "Great gods" carefully maintained, these people can be called a top-notch code master. ReadLinuxIn the way of kernel code, we learn not only kernel-related knowledge, but also their programming skills and computer understanding.

I am also using a project.LinuxI have benefited a lot from the analysis of the kernel source code. In addition to getting related kernel knowledge, it also changes my previous knowledge of kernel code:

1. Kernel source code analysis is not "unattainable ". The difficulty of kernel source code analysis lies not in the source code, but in how to use more appropriate methods and means for analyzing code. Because of the huge kernel size, we cannot follow the general analysisDemoThe program starts step-by-step analysis from the main function, and we need a means to intervene in the kernel source code from the middle ". This "demand-as-you-go" approach allows us to grasp the main line of the source code, rather than over-tangle with specific details.

2The kernel design is elegant. The particularity of the kernel determines that the execution efficiency of the kernel must be high enough to respond to the real-time requirements of current computer applications.LinuxKernel usageCMixed Programming of language and assembly. However, we all know that software execution efficiency and Software maintainability are in many different ways. How to Improve the maintainability of the kernel while ensuring the kernel efficiency depends on the "beautiful" design in the kernel.

3. Magic programming skills. In the general application software design field, the status of coding may not be overly valued, because developers pay more attention to the good design of software, coding is just a matter of implementation means-just like taking an ax for firewood, you don't need to think too much. However, this is not true in the kernel. A good coding design not only improves maintainability, but also improves code performance.

Everyone's understanding of the kernel will be different. As we continue to deepen our understanding of the kernel, we will have more thoughts and experiences on its design and implementation ideas. Therefore, this article is more expected to guide moreLinuxPeople outside the kernel gate enterLinuxTo experience the magic and greatness of the kernel. I am not an expert in kernel source code. I just want to share my experiences and experiences in source code analysis and provide reference and help to those who need it, the "high-sounding" point is also a small contribution to the computer industry, especially the operating system kernel. A lot of gossip~), I will share myLinixKernel source code analysis method.

Ii. How difficult is the kernel source code?

Essentially, analysisLinuxThe kernel code is no different from other people's code, because it is generally not your own code. Let's give you a simple example. A stranger gives you a program at will, and asks you to explain the functional design of the program after reading the source code, I think a lot of people who feel that their programming skills are good will certainly feel that there is nothing to do with it. As long as I patiently read his code from start to end, I will certainly find the answer, and this is indeed the case. Now let's make another assumption. If this person isLinusFor youLinuxDo you still think the code of a module in the kernel is so easy? Many may hesitate. Also a stranger (LinusWell, if I knew you~) Why do we feel different about your code? I think there are the following reasons:

1.LinuxThe kernel code seems somewhat mysterious to the outside world, and it is very huge, so it may not be able to start. For example, it may come from a very small reason-it cannot be found.MainFunction. For simpleDemoProgram, we can analyze the meaning of the code from the beginning to the end, but the kernel code analysis is completely ineffective, because no one canLinuxRead the code from the beginning to the end (because it is not necessary to use it ).

2Many people have also been familiar with the code of large software, but most of them are application-oriented projects. The form and meaning of the Code are related to the business logic that they often come into contact. Unlike kernel code, most of the information it processes is closely related to the underlying computer. For example, the lack of relevant knowledge such as the operating system, compiler, assembly, and architecture will also cause many obstacles to reading the kernel code.

3The kernel code analysis method is not reasonable. In the face of a large number of complex Kernel code, if you do not start from a global perspective, it is easy to fall into the details of the Code. Although the kernel code is huge, it also has its design principles and architecture. Otherwise it would be a nightmare for anyone to maintain it! If we clarify the overall design concept of the Code module and analyze the code implementation, it may be easy to analyze the source code.

I personally understand these problems. Analysis is possible if you have not been in touch with large software projectsLinuxKernel code is a good opportunity to accumulate experience in large-scale projects (indeed,LinuxThe Code is the largest project I have ever encountered !). If you do not have a thorough understanding of the bottom layer of the computer, you can choose to analyze and learn to accumulate the underlying knowledge. It may be a little slow to analyze the code at the beginning, but with the accumulation of knowledge, weLinuxThe "business logic" of the kernel will become clearer. Finally, I want to share with you how to grasp the source code from a global perspective.

Iii. kernel source code Analysis Method

Step 1: Collect data

From the perspective of understanding new things, before exploring the essence of things, we must have a process of understanding new things. This process is a preliminary concept of new things. For example, if we want to learn the piano, we need to first understand the basics of playing the piano. We need to learn basic music, music, five-line music, and other basic knowledge, and then learn the techniques and fingering of playing the piano, in the end, you can start practicing the piano.

The same is true for kernel code analysis. First, we need to locate the content involved in the code to be analyzed. It is the code for Process Synchronization and scheduling, the code for memory management, the code for device management, or the code for system startup. The size of the kernel determines that we cannot analyze all the kernel code at a time. Therefore, we need to give ourselves a reasonable division of labor. AsAlgorithmThe design tells us that to solve a major problem, we must first solve its subproblems.

After locating the scope of the Code to be analyzed, we can use all the resources at hand to fully understand the overall structure and functions of the Code.

 

 

All resources mentioned here referBaidu,GoogleLarge online search engines, operating system principles teaching materials, Professional Books, experience and materials provided by others, and evenLinuxDocumentation, comments, and names of source code identifiers (do not underestimate the names of identifiers in the code, sometimes they provide critical information ). All the resources here refer to all the available resources you can think. Of course, we are unlikely to collect all the information we want through this form of information. We just want to be as comprehensive as possible. The more comprehensive information is collected, the more information can be used in the code analysis process, and the less difficult the analysis process.

Here is a simple example. Suppose we want to analyzeLinuxThe code to implement the variable frequency mechanism. So far, we only know this term. With its literal meaning, we can roughly guess that it shouldCPUFrequency adjustment. We should be able to obtain the following information through information collection:

1.CpufreqMechanism.

2.Performance,Powersave,Userspace,OnDemand,ConservativeFM policy.

3./Driver/cpufreq/.

4./Ention/cpufreq.

5.P stateAndC state.

......

AnalysisLinuxIf the kernel code can collect such information, it should be said that it is "lucky. After allLinuxKernel information is indeed not as good. NetAndJquerySo rich, but compared to a decade ago, there was no powerful search engine, and the period without relevant research materials should be called a "great harvest! Through simple "Search" (it may take one or two days), we even found the source code file directory where the code is located, I have to say that this information is "worth the money "!

Step 2: locate source code

From the data collection, we are lucky to find the source code directory related to the source code. However, this does not mean that we are indeed analyzingSource code. Sometimes the directories we find may be scattered, and sometimes the directories we find contain a lot of code related to specific machines. What we are more concerned with is the main mechanism of code to be analyzed, instead of machine-related special code (this helps us better understand the nature of the kernel ). Therefore, we need to carefully select the materials that involve code files. Of course, this step is unlikely to be completed at one time, and no one can guarantee that all source code files to be analyzed can be selected at one time without any leakage. But we don't have to worry about it. As long as we can grasp the core source files related to most modules and analyze the code in the future, we will naturally find them all.

Go back to the above example and read it carefully. /Ention/cpufreq . Current Linux The source code stores the module-related documentation in the source code directory. Encryption ention If the module to be analyzed does not have a document description, this will increase the difficulty of locating the key source code file, but will not cause us to not find the source code to be analyzed. By reading the instructions in this document, we should be aware of at least /Driver/cpufreq. c This source file. Through this documentation on source files, combined with the previously obtained FM policies, we can easily focus on Cpufreq_performance.c ,Cpufreq_powersave.c , Cpufreq_userspace.c , Cpufreq_ondemand , Cpufreq_conservative.c These five source files. Have all the involved files been found? Don't worry. You can find other source files sooner or later by analyzing them. If Windows Use Sourceinsight When reading the kernel source code, we can easily find another file by calling the function and searching for symbolic references. Freq_table.c , Cpufreq_stats.c And /Include/Linux/cpufreq. h .

 

 

Based on the information flow direction, we can locate the source code file to be analyzed. Source code locating is not critical because we do not need to find all source code files. We can postpone some work to the code analysis process. Source code locating is also critical. Finding some source code files is the basis for source code analysis.

Step 3: simple comments

In the located source code file, analyze the general meaning and functions of each variable, Macro, function, struct, and other code elements. The reason why this is called a simple annotation is not that the annotation work in this part is very simple, but that this part of annotation does not need to be overly detailed, as long as it roughly describes the meaning of the relevant code elements. On the contrary, the work here is actually the most difficult step in the entire analysis process. This is the first time that we have penetrated the kernel code, especially for those who have analyzed the kernel source code for the first time.GNUOfCSyntax and overwhelming macro definitions can be desperate. At this time, as long as you sink your mind and find out every key difficulty, you can ensure that similar difficulties will not be trapped in the future. In addition, other kernel-related knowledge is constantly extended like a tree.

For exampleCpufreq. cThe"Define_per_cpu"Macro usage, we can basically find out the meaning and functions of this macro through reading materials. The methods used here are basically the same as those used to collect data. We can also useSourceinsightTo view its definition, or to useLkml(Linux Kernel Mail List) Check. We can still goWww.stackoverflow.comAsk for answers (what isLkmlAndStackoverflow? Collect information !). In short, using all possible means, we can always get the meaning of this macro-for eachCPUDefine an independent variable.

We also do not need to make the description of comments accurate Once (we do not even need to find out the specific implementation process of each function, as long as we understand the general meaning of the function ), we combine the collected information and the analysis of the code behind it to constantly improve the meaning of the annotations (the original annotations and identifiers in the source code are very useful here ). The meaning of the comment is constantly modified by constantly commenting on the materials.

 

 

After a simple annotation of all the involved source code files, we can achieve the following results:

1. The meaning of the code elements in the source code is basically clarified.

2. Found out all the key source code files involved in this module.

Based on the information and materials we have collected, we can compare the analysis results and materials to determine and correct our understanding of the Code. In this way, through simple comments, we can grasp the main structure of the source code module as a whole. This achieves the basic purpose of simple annotations.

Step 4: detailed comment

After the simple comments of the code, we can think that the analysis of the module is half done, and the rest is the in-depth analysis and thorough understanding of the Code. Simple annotations cannot accurately describe the specific meaning of code elements, so it is necessary to describe in detail. In this step, we need to clarify the following:

1. Variable defines when to be used.

2When macro-defined code is used.

3. Function parameters and returned values.

4. The execution process and call relationship of the function.

5The meanings and conditions of struct fields.

We can even call this step a detailed function annotation, because the meanings of code elements outside the function are basically clear in simple annotations. The Execution Process and algorithm of the function itself are the main tasks of commenting and analyzing this part.

For exampleCpufreq_ondemandPolicy implementation algorithm (FunctionDbs_check_cpu. We need to analyze the variables used by the function and the called function and find out the ins and outs of the algorithm. The best result is that we need the execution flowchart of these complex functions and the function call relationship diagram. This is the most intuitive expression.

 

 

Through the annotations in this step, we can fully grasp the overall implementation mechanism of the Code to be analyzed. All the analysis work can be considered complete.80%. This step is particularly critical. We must make the annotated information accurate enough to better understand the division of the internal modules of the Code to be analyzed. AlthoughLinuxThe macro Syntax"Module_initAndModule_exit"Declare the module file, but the division of the module's internal sub-functions is based on a full understanding of the module's functions. Only by correctly dividing modules can we find out which external functions and variables the module provides (usingExport_symbol_gplOrExport_symbolExported symbols ). In order to continue the next module identifier dependency analysis.

Step 5: module identifier dependency

By dividing the code modules in Step 4, we can easily analyze the modules one by one. Generally, we can start with the module entry function at the bottom of the file ("Module_initAndModule_exit"Declared functions are generally at the end of the file), according to the functions they call (defined by themselves or functions of other modules) and use the key variables (the global variables in this file or external variables of other modules) to draw a "Function-Variable-Function dependency graph, which is called the identifier dependency graph.

Of course, the dependency between identifiers in a module is not simply a tree structure. In many cases, it is a complex network relationship. At this time, our detailed comments on the Code are embodied. Based on the meaning of the function, we divide the module into sub-functions and extract the dependent tree of each sub-function identifier.

 

 

Through the identifier dependency analysis, we can clearly display the variables used by the functions defined by the module to call those functions, and the dependencies between the module sub-functions-which functions and variables are shared.

Step 6: Inter-module dependency

Once the dependency graph of all the internal identifiers of a module is organized, dependencies between modules can be easily obtained based on the variables or functions of other modules used by the module.

 

 

CpufreqThe module dependency of the code can be expressed as the following.

 

 

Step 7: module Architecture

Through the dependency relationship between modules, we can clearly express the position and function of the module in the code to be analyzed. Based on this, we can classify modules and sort out the code architecture relationships.

 

 

For exampleCpufreqAs shown in the module dependency diagram, we can clearly see that all FM policy modules depend on core modules.Cpufreq,Cpufreq_statsAndFreq_table. If we abstract the depended three modules as the core framework of code, these FM policy modules are built on this framework and are responsible for interaction with the user layer. The core moduleCpufreqProvides drivers and other related interfaces to interact with the underlying system. Therefore, we can obtain the following module architecture diagram.

 

 

Of course, the structural diagram is not an inorganic mosaic of modules. We also need to enrich the meaning of the structural diagram with the materials we have consulted. Therefore, the details of the Architecture diagram here may be different with the understanding of different people. However, the main body of the structural diagram has the same meanings. So far, we have completed all the analysis of the kernel code to be analyzed.

Iv. Summary

AsArticleIn the beginning, we cannot analyze all the kernel code. Therefore, it is an effective way to understand the essence of the kernel by collecting information about the analyzed code and then analyzing the original beginning and end of the Code according to the above process. This method analyzes the kernel code according to the specific needs to quickly enterLinuxThe kernel world provides the possibility. In this way, we constantly analyze other modules in the kernel, and finally obtain our ownLinuxThe understanding of the kernel makes us learnLinuxThe purpose of the kernel.

Finally, we recommend two reference books for kernel learning. One is 《LinuxKernel design and implementationLinuxMain functions and implementations of the kernel. But it will not bring readersLinuxIn the abyss of kernel code, it is to understand the kernel architecture and getting started.LinuxA good reference for kernel code, and the book will increase your interest in kernel code. Another is "deep understanding ".LinuxKernel, I don't have to say much about the book's classics. I just suggest that you better read this book with the kernel code. Because this book describes the kernel code in great detail, reading with the code can help us better understand the kernel code. At the same time, in the process of analyzing the kernel code, you can also find reference materials in this book. Finally, we hope you can get into the kernel world and experience it as soon as possible.LinuxSurprise!

 

 

Source: http://www.cnblogs.com/fanzhidongyzby/archive/2013/03/20/2970624.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.