Code Analysis Method

Last Update:2018-12-05 Source: Internet

Author: User

Tags doxygen

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Reprinted please indicate the source!

Over the past six months, we have been working on pre-researching new products and code analysis. The entire process is challenging and exploratory. Looking back at the work and study over the past six months, I have some summary and experiences on the code analysis methods, and I will record them for forgetting. This document is not a final document and will be provided at any time in subsequent work and study processes.

The advent of the open-source era brings a great feast to programmers. There are a lot of excellent products and Code on the Internet for us to taste. But when you really want to pr a specific open-source software, you will face many problems, such as large code size, few documents, and no technical support. How can we learn the features and implementation details of a product more effectively and quickly? I think this is a question that every programmer cares about and thinks about. Here we only record my experiences and experiences in studying and learning code. If you need to learn more about code reading methods, please read the code reading of diomidis SPINELLIS.

1. Collect information

The first step in code reading is to collect all the information that can be collected, including the following:

Project user documentation
Project design document
Project FAQ
Columns for project testing

This information can be obtained from the project's home page, wiki, Google, email list, forum, and related papers and books, and the collected information is managed in a unified manner. Here I recommend three good knowledge management software: mybase, Moin, and Google notebook:

Mybase: this tool can be used to collect any digital information, including web pages, files, and multimedia information. It can be easily divided into different categories according to the tree structure.

Moin: This is a wiki tool. The home page provides a desktop version that can be conveniently copied to a USB flash drive, its biggest feature is that each article is stored separately and can be conveniently managed using version management tools.

Google notebook: You can easily tag each knowledge point for easy retrieval. It seems that the development of notebook has been stopped, and the focus of the team is on Google document.

2. Develop Analysis Strategies

Because the amount of code for open-source software is often amazing, there is often a lack of relevant design documents and materials, and personal energy is also limited. Therefore, it is important to develop an analysis policy before code analysis. Before analyzing code, we must first clarify what is the ultimate goal of our analysis?

If the main purpose of code analysis is to understand the features of this product, we should focus on the following aspects:

What fields are applicable to the product?
Related Products and existing products?
What are the advantages and disadvantages of the product?

Through Horizontal and vertical comparison of products, we can roughly locate products and determine whether they can be applied to the existing product framework.

If the main purpose of code analysis is to learn a new framework, we should focus on the following aspects:

What core technologies does the product use?
What is the logical framework of the product?
What modules are the product framework divided? What is the relationship between modules?

If the main purpose of code analysis is to study the specific implementation details of a module or function, we should focus on the following aspects:

What algorithm is used for implementation? What are the ideas and principles of algorithms?
What data structure is used to describe the implementation?
What implementation skills are used in the implementation process?

No matter which aspect of the analysis is aimed at, in the analysis process, we should focus on the basis of the proposed analysis objectives and constantly raise new questions to ourselves. When I analyzed the specific code, I was lost to the huge code due to lack of clarity. The results not only took a lot of time, but were not expected.

Iii. Source Code Analysis

Collect and read relevant documents, and determine the analysis policy before you can start to study the code. Code analysis is mainly divided into several parts: Build a runtime environment, build a test environment, static analysis of source code, and dynamic analysis of source code. Static analysis focuses on the logic structure and relationship of the entire code, while dynamic code analysis focuses on data streams and implementation methods.

3.1 Build a runtime environment for source code

When the source code goes down from the network, you must first ensure that the code at hand is indeed executable. Therefore, you must first build a complete executable system. In this way, the future compilation or runtime errors caused by modifications can be determined to be caused by our own modifications, rather than the problems of the Code itself. Another advantage of building an executable system is that by reading the Project Build code (makefile, scons, or shell), you can understand the general framework structure of the product, you can also understand the features and libraries and files currently supported by the system. Generally, open-source software provides README, install files, or the requirement files and build methods on its homepage. Refer to the document to build at least two different versions of debug and release. However, the document often only mentions some basic build commands and compilation rules which may imply many product features. These features can be obtained by reading the MAKEFILE file or using config -- help. Using different compilation feature combinations, we can build test versions with different features.

The document building process is a good habit, which not only prevents forgetting, but also provides some information for subsequent analysis. Here, I usually use the Excel tool in oOffice to record the relationship between the compilation option and the file it builds in the form of a table.

After the build is successful, you should use a familiar version management tool (SVN, git, CVS) to manage the code, because after each modification, may cause issues in code compilation and running. You can use the version management tool to roll back quickly.

3.2 build a test environment

Generally, open-source software provides test methods and code, which can be obtained from the project homepage or code tree. After obtaining the test code and method, you should run the test code once in the build product, so that you can understand the current implementation status of the product and discover some defects. Of course, not all products provide effective test code. In this case, you can only manually write some simple test columns for the product to run. Finally, the test process should be recorded and written into a test script for later use in code Dynamic Analysis and code modification.

3.3 static code analysis

At this point, we should have a general understanding of the product's logical framework structure and a certain user experience. Know the features that the product has completed, which features are not yet supported, and what features are there. Next, we should have a deeper understanding of the product structure from the static structure of the Code.

3.3.1 code statistics

Here we will mainly make some statistics on the amount of code reading work, mainly to understand the following aspects:

How many code files are there?
Are you using one or more languages? Language distribution?
How many lines of code are there?

3.3.2 build a module relationship

Generally, functional modules of open-source software are divided by directory hierarchies, and the directory name or file name clearly indicates the functions of the module or file. Therefore, by understanding the Organizational Relationship Between files and directories, you can intuitively understand the composition structure of the Code. However, there are also special columns. If the file name or directory name cannot be clearly indicated, You need to analyze the relationship between the package and the class in the code to determine the composition.

In my work, I wrote some scripts for the module analysis work. Through these scripts, we can easily display the relationships between directories and files in charts.

3.3.3 build a UML diagram

If the code is too complex and huge, you should consider using some reverse engineering tools to dynamically generate UML diagrams (Rose, Jude) from the source code, so that you can more intuitively reflect the composition of the Code. This process should focus on figuring out the relationship between packages and between objects.

3.3.4 build a source code reading environment

Before reading the code, you need to use some code reading tools to organize the code and create an index, which makes it easier to locate the code. Here I recommend global, doxygen, and source navigator. These three tools should be good source code indexing tools used in Linux.

Global is an open-source software developed by GNU and can be built into an HTML-based source code index page. By working with Apache, you can easily find and locate functions and classes.

Source navigator is currently the best source code browsing tool for the graphic interface in Linux. It was previously developed and maintained by red hat, but it has been stuck for five years. Recently, a German organization took over the development process. It should be said that this is the only product in Linux that can compare with source insight.

Doxygen, the software can not only create code indexes, but also obtain information from code comments to generate help information. It can also draw out the function relationship diagram and class relationship diagram.

3.3.5 understand key data structures and algorithms

This activity mainly focuses on reading the code. by reading the header file of the code, you can have a preliminary understanding of the key data structures and algorithms involved. Measure the test taker's knowledge about the functions and relationships between these data structures. Finally, you can draw a general graph of the data structure by understanding the data structure code.

3.3.6 deliverables

Through the above analysis activities, we can get the following outputs.

1) understand the workload of code reading

2) understand the composition structure of the Code (module, package, and Class)

3) understand the Core Data Structure

4) Draw a code data statistical table, module and package relationship diagram, class relationship diagram, and data structure logical relationship diagram.

4. Dynamic Analysis of source code

The dynamic analysis process of source code mainly aims to understand the key data structure operations, function calling relationships of key functions, and data organization and flow direction during system operation. This analysis process is divided into several parts: runtime environment analysis, function call analysis, and runtime data analysis.

4.1 Runtime Environment Analysis

In this step, you need to understand the following issues:

What are the environment variables required during running?
What are parameters?
What libraries have since the runtime?
What resources are accessed during runtime (such as files, networks, and memory allocation )?

This process can be obtained by reading related files under/proc/XXX, using LDD, objdump, and other tools.

4.2 runtime function call Analysis

Previously, the big build test environment was used here. By running different test columns, besides, tools such as GDB, callgrind, GPROF, and codeviz can be used to conveniently obtain the call sequence of a function. The captured data is processed by a script to remove unnecessary noise information. This allows you to draw a very intuitive Sequence Diagram for calling functions during runtime. This process is quite helpful for us to understand the implementation process of specific product features.

4.3 Data Analysis During Running

The data analysis process during running hours mainly involves the creation, operation, and destruction of key data structures. The data content in different execution processes is output in readable form using GDB or logger tools. Through the combination of columns, you can finally draw the runtime relationship diagram of the key data structure. There is no uniform method to complete data collection tools. I use GDB in many ways, through GDB and custom macros, allows you to dynamically capture and output specified data structures. Of course, this method is not effective for time-sensitive calling processes. Therefore, logger is usually used to output data. In many open-source software, developers often provide some macro or data traversal methods for code testing. These functional interfaces can be used to conveniently output the required information.

4.5 deliverables

There are several outputs at the end of this project activity:

A) function call information for different columns

B) Data Structure Diagram

C) Data Flow Diagram

D) Principles of Algorithm Implementation

5. Add and modify features

The ultimate goal of understanding a system is to use it. Therefore, to verify that the understanding of the software is correct, we should modify the code on the basis of the existing one to verify that our understanding is correct. However, the entire process must be based on a good testing environment. The main steps of this activity are as follows:

A) understand the Implementation Rules of the existing code (ADD and modify the code according to the unified rules of the Code)

B) determine the target of modifying or adding a function (what function is implemented or modified, and what effect is achieved ?)

C) implementation code

D) Compile the test code to verify that the implementation process is correct.

Iv. Summary

Through the above four engineering processes, we can have a more objective understanding of a product, and should finally be expressed in the form of documents. If you add new features to the software, you also need to add interface documentation and design documentation for the new features.

Is the thought map after the summary:

~~~~~~ End ~~~~~~

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More