How to study source code

Source: Internet
Author: User
Tags documentation generator naming convention regular expression

When we write a program, there is a lot of time to look at someone else's code.
For example, look at the group code, see the group integration code, if at first did not plan how to see,
Will "look at the bitter (Taiwan language)"

Whether it is a reference or, from open source to grasp the study, in order to understand the meaning, in a limited time, will inevitably be a huge source of code interpretation feel pressure.
There is a network on the analysis of the code to see the method, as a programmer for you, may wish to refer to see,
A different angle to analyze. can also be more efficient in interpreting the code snippets you want.

Six chapters:
(1) Read the code so that the heart is used by me.
(2) To understand the structure, you can easily grasp the whole picture.
(3) High-quality tools in hand, read the program is not difficult.
(4) Words too literally the function of the component.
(5) To find the program entrance, and then from the top down cobwebs.
(6) Read the fun, understand the author through code.


Read someone else's code (1)---Read the code to make the heart work for me

Code is written by someone else, only the original author really understand the purpose and meaning of the code. Many programmers have an unconscious sense of fear of being forced to touch code written by others. However, rather than resisting the ability to receive other people's code, it is better to thoroughly understand the relevant language and practices as a cornerstone of self-empowerment.

Writing code may be a happy thing for most programmers, but I believe that more people read the code written by others as perilous undertaking. Many people prefer to write their own code again, rather than receiving code from others, correcting bugs, maintaining them, and even enhancing functionality.

What is the key to this? If gave away, in fact, is also very simple, the code is written by others, only the original author really understand the purpose and meaning of the code. Many programmers have an unconscious sense of fear of being forced to touch code written by others. This is the primitive fear of strangers from the depths of the human heart.

Read the code that someone else has written, and you get a full harvest
However, for many practical reasons, programmers are often forced to receive code from others. For example, a colleague leaves the job, must take over his legacy of work, it is possible that you are just into the department of rookie, and colleagues experience is enough, upgrade, the wind turbine circulation, a generation of rookie change rookie. Even the projects your company undertakes must take over or integrate the systems that the customer has left behind, and you only have the original code for that system (and when you're lucky, there are a number of different files).

Such stories, in fact, are constantly on the side of the programmer or the body continues to perform. Many programmers will take over other people's code as a tragic thing. Everyone doesn't want to take over the code that someone else is writing because they don't want to spend time exploring, preferring to spend their productivity generating new code instead of trying to understand the code.

Unfortunately, the above situation is very difficult for programmers to avoid. We always have to touch the code written by someone else and even have to understand it and modify it. For this demand, in today's trend of open source code, as mentioned in the previous "Programming 2.0" article, you can learn from the open source code to new technology, learn to master architecture design, greatly improve the efficiency and effectiveness of learning. You can even extract the code you need directly from the open source project, stand on the shoulders of giants, and get the productivity you need directly from the other side. From this point of view, reading the code written by others is no longer just "forced reception" from a negative point of view, but a very positive value of "absorbing nutrients." ”

Understand the system architecture and behavior patterns first, then read them carefully.
If writing code is one of the most important skills of a programmer, then it is another important skill to read someone else's code and then modify it.

If you are not familiar with this work, not only in the situation you do not want to face, not solve the problem of taking over other people's code, more importantly, when you look at the ready-made code, but do not know how to extract their own needs, resulting in the last only into Baoshan empty-handed back, hope of the sigh.

Access to other people's code, can be broadly divided into three degrees: one, understand, two, modify, expand, three, extract, refine. Understanding other people's code is the most basic work, if you do not know the code you want to deal with, do not talk about modification or expansion, more unlikely to chaff, extract their own needs, recycling the code written by others. Although "reading", code is not like an article or novel, through this approach, you can get a certain degree of understanding. When you read an article or a novel, you read it almost sequentially, and you only have to turn to the first page and read it in a row. However, there are many programmers who try to read other people's code, but often do not know how to read the difficulties.

It is not difficult to find the first page of the system (that is, the starting point for code execution), but a system with high complexity is sometimes very large and sometimes complex.

Starting from the start point of the code to read, a sequential reading of all the code is time-consuming, and in this way to understand the system, it is difficult to build a system in the brain, and then understand the real behavior of the system. Therefore, the focus of reading code is not to read every stroke code, but to efficiently through exploration and reading, so as to understand the system architecture and behavior patterns. So that when you need to know the details of any fragment, you can quickly map to the specific code location in your brain, until that moment is the time to read it carefully.

Familiar with the language of communication and idioms
In any case, some basic preparation is necessary to read someone else's code.

First, you'd better understand the programming language written in the code. If you want to read a novel written in French, you can't even understand French. Some cases are very special. Although we do not understand the language of the code writing, but because of the high-level modern language, and the popular programming language is mostly a similar pedigree, so even if not so familiar, sometimes can do.

In addition to knowing the language, it is necessary to first confirm the naming convention (the naming convention) used by the code. It is important to understand the naming conventions, and different programmers or development teams may have significant differences.
This naming convention covers a range that typically includes the name of the variable, the name of the function, the name of the category (if it is object-oriented), the source code file, or even the name of the project construction directory. If you use methods like design patterns, these names have some specific representations.

Naming conventions are a bit like a set of communication jargon that programmers build on programming languages. Programmers will express some of the higher-order concepts through common constraints and adherence to the naming conventions. For example, a well-known Hungarian nomenclature combines variable names with attributes, types, and descriptions. For programmers, this is a way to provide richer information about the role and nature of the variable.

Familiarity with this approach is important for code reading, because when you understand the conventions of the system as a whole, you can try to understand them with the vocabulary that they use to work together. If you do not understand the conventions it uses, then these additional information will not be available to you. Like code written in design mode, it is also filled with the name of the pattern, such as: Factory, façade, agent and so on. The categories that refer to these names also express their own functions directly through their names. For those who understand this naming convention, they do not need to delve into it and can quickly capture the meaning of these categories.

When you get a set of code that must be read, it's a good idea to get a description of the naming convention first. However, not every set of code is accompanied by such a description file. Another way is to go through the code and browse through it, and experienced programmers can easily discover the naming conventions used by the system.

The common naming method is not to take off those categories, at this time experience is very important, if you know the more the practice, the more easily identify the practice of others. If you have bad luck, the code uses a convention that has never been seen before, so you have to take a moment to sum up and find out the rules of this code naming by your own power.

Mastering the mindset and habits of code writers
Most of the code, basically, follows a consistent naming convention. But when luck is worse, a set of systems may be flooded with multiple naming conventions. This may be because the development team consists of multiple groups of people, each team has a different culture, and in the project development management is not properly controlled. In the worst case, the code has no obvious convention at all, and it's much harder to read.

To read the code, try to understand the "heart" of the code author first. If you want to do this, you need to know more about the language used by each other, and the usual vocabulary. In the next step, we'll continue to explore topics related to reading code.



Read other people's Code (2)-Understand the architecture to make it easy to master the whole picture

In this article, we focus on: to understand a system, it is best to take a top-down approach. Try to capture the architecture of the system first, don't get into the details too early, because that's usually not helpful for you to know the whole picture. Reading code does not need to be read from the first line, and our goal is not to read through each piece of code.

For many reasons, programmers need to read code written by other people. The most positive value for programmers in the 2.0 era of programming is that people who can read other people's programs have the ability to extract the programs they need to improve productivity.

The purpose of reading code is to understand the full picture rather than the details
The fundamental basis for reading someone's code is to understand the programming language and naming conventions used by the other person. With this foundation, the basic reading ability is considered. As I mentioned before--to read a novel written in French, you can't even understand French. Reading code and reading literary works requires an understanding of the language used in writing and the idiomatic of the author.

But we are usually reading literary works in a sequential way, that is, starting from the first page, a line of reading, follow the author for you to lay down the pace, gradually into the world he prepared for you. Reading code is vastly different. We seldom start reading from the first line, because it is rarely done unless it is a simple, single-threaded program. Because if you do, it's hard to get a complete picture of the whole system. Yes, we're talking about a point where reading code is about understanding the full picture of the system, rather than just reading through each piece of code in a carpet-style way.

In the case of a system written in object-oriented programming language, the entire system is disassembled and analyzed into separate categories. Reading individual categories of code may make it possible to understand the individual behavior of each category of object. But how to interact with each kind of object, how to work together, it is easy to fall into elephant predicament. This is because different types of code, only describe the behavior of individual objects, and fragments of reading can only create one-sided understanding.

The composition relationship can be easily understood after the structure is clarified from top to bottom
If you want to jump off the hook and don't want to waste a lot of time reading the code, but you can always only capture the knowledge of the system fragments, you have to switch to another perspective to look at the system. Starting from an individual class behavior is a bottom-up (bottom-up) approach, while reading code, you should first take the top-down (top-down) approach. For code reading, from top to bottom, you have to understand the entire system architecture first.

The architecture of the system is the backbone and pillar of the whole system. It shows the most prominent features of the system. Knowing what kind of system architecture belongs to that type usually greatly benefits the static and dynamic relationships between the individual components of the system. Some systems determine the top-level architecture because of the technology or framework used. For example, using the Java servlet/JSP technology application system, the outermost architecture is based on the EE (or at least the Web container in Java ee) as the root.

When using the Java servlet/JSP technology, the relationship between some components is determined. For example, the Web container loads all of the servlets, listeners, and filters based on the contents of XML. Whenever an event occurs in context (such as initialization), it notifies the listener category. Each time it receives a request from the client, it follows the set of all filter chains, giving each filter the opportunity to check and process the request, and then the request to the servlet that is used to process the request.

When we understand that a system uses such an architecture, it is easy to know the relationship between the various components. Even though we don't know how many servlets we have, we will know that whenever a request is received, there will always be a corresponding server to handle it. When you want to focus on how a request is handled, I should find the server that corresponds to the request.

Understanding the architecture, you must add a sense of hierarchy
Similarly, in Java-written web applications, an MVC framework such as struts might be applied, as well as a data access framework like hibernate. They can all be seen as sub-architectures under the most important architectures. And each application system may even be under struts and hibernate to build its own more subordinate architecture.

That is to say, when we talk about the concept of "architecture", we must have a sense of hierarchy. Regardless of the level of architecture, they define their roles and the relationships between them. For the reader, it is better to understand how many roles exist in a particular architecture, and how they interact with each other, than to directly cut into the slightest single role behavior, and the comparison can help us understand how the system works.

This is an important key, and when you try to get to the most detail, you should try to figure out the roles involved and the relationship between them. For example, there are 3 important roles for the event-driven architecture. One is the dispatcher (event dispatch) of the event handler, one is the event generator (the event generator), and the other is the event handler (event handlers).

The event generator generates events and is sent to the event dispatcher, and the event dispatcher is responsible for locating the event handlers corresponding to each event, and forwarding the event and ordering the event handler to handle it. Windows applications like the graphical user interface are an event-driven architecture.

When you know that this type of application is an event-driven architecture, you can learn further that there are 3 main roles in this architecture. Although it may not be clear how many types of events will need to be handled throughout the system, for you, you have established an overview of the overall system picture.

Although you do not know all the details, but such as the exact type of events such as the information, it is not important at this moment-do not forget that we are taking the top down way, we must first understand the main building structure, as for the wallpaper color of how to deal with, it is at the end of the thing will do.

The first thing to explore architecture: Find out how the system is initialized
Experienced programmers are familiar with the structures that are often used. Often it takes just a few glances to understand the architecture used by a system, and it is natural to be able to relate directly to the roles in which it exists and the relationships between them. However, not every system uses a structure that is familiar to the public or can be see through at a glance. At this time, you need to explore. The goal is also to define the role, and the static, dynamic relationship between the characters.

Regardless of the architecture of a system that is well known to most people, when trying to explore the looks of a system, we should find a few answers about how the following is accomplished under the architecture it uses: First, how the system is initialized, two, what other systems (or users) are connected to this system, And the interface is what; thirdly, how the system reacts to various events, four, how the system handles various anomalies and errors.

How the system initializes is an important thing, because initialization is the preparation for everything that follows. From the way of initialization, content, can know what the system is prepared, for the system will have what behavior show, also can get a glimpse of twos. The reason to understand other systems (or users) that are connected to the system is to define the boundaries of the system. Other systems may provide input to the system we are exploring, or we may receive output from this system to understand where the boundary lies before you can determine the appearance of the system.

The types of events that the system responds to, and how they react, basically represent the main behavioral patterns of the system itself. Finally, we must understand how the system handles exceptions and errors, which is also an important behavior of the system, but is easily overlooked. Before, we mentioned the need to have a systematic language foundation to be able to read further, and in this article we focus on: to understand a system, it is best to take the top-down approach. Try to capture the architecture of the system first, don't get into the details too early, because that's usually not helpful for you to know the whole picture.


Read other people's Code (3)-high-quality tools in the hands, read the program is not difficult

The complexity of the system often exceeds the load of the human brain. When you read the code, you will need more tools to assist you. With a good integrated development environment (IDE) or text editor, you can provide the most basic help.

Reading the Code of the action, can be very primitive, using the simplest text editor, one after another to open the original code, and then by the organization of their own ability to jump between different code, to piece together the brain to build the image.
However, the complexity of the system often exceeds the load on the human brain. When you read the code, you will need more tools to assist you. With a good integrated development environment (IDE) or text editor, you can provide the most basic help.

Use a text editor or IDE to accelerate your code interpretation
Many text editors provide syntax and keyword labeling capabilities for common programming languages. This can definitely play a big role in reading. Some text editors (such as my frequently used editors and occasionally used Notepad + +) can even automatically list all the defined functions in a source file, allowing you to select functions directly from the list and jump directly to the defined position of the function. This provides great convenience for people who are reading code.

Because the most common thing to do when reading code is to move the focus of the reading from one function to another that it calls, along with a control flow in the program. So for programmers, one of the most common things to do when reading code is to find out where a function is located in the original file, and then find the location of the function.

The good IDE can provide more assistance. Some can automatically present some additional information, the most useful is the function of the prototype declaration. For example, some IDES support that when a cursor rests on a function name for a period of time, it displays the prototype declaration of the function in a prompt manner.

For the person reading the code, when you see the code in the call to a function, you can directly take advantage of such support, immediately get the information related to the function of the prototype, immediately know how to call the function of the various arguments passed in, and do not have to wait until the definition of the function to find out, to understand this thing.

grep by (Reader: Referral source perspective) is a basic and extremely useful tool
In addition to the selection of a good text editor or IDE, there is a basic, but extremely useful tool, which is grep press. Programmers familiar with UNIX operating systems are probably not unfamiliar with grep by this utility. GREP has the greatest use in that it allows us to search for all specified files in a directory (including recursive access to all subdirectories), and whether there are files that meet the specified criteria (constant string or regular expression).

If there is, it will help you to point out where you are. This is very useful when reading code. As we read in the footsteps of any unknown, but we think important categories, functions, data structure definitions or variables, we have to find out exactly where it is in this vast code sea, in order to change this tile from unknown to known.
grep is useful because when we find something unknown, it's easy to use it to find out where this unknown thing is. In addition, although grep is one of the standard utilities for UNIX systems, there are various types of grep by program, such as Windows. For programmers who work in a Windows environment, they can choose their own tools that feel called hand.

Gtags can be indexed to make the search more efficient
grep is easy to use, but there are still some shortcomings. The first drawback is that it does not index the source files that are being searched. Every time you search, it will find all the files, and read all of them, filtering out the files that meet the specified criteria. When the original code number of the project is too large, it will produce a problem of low search efficiency.

The second disadvantage is that it is just a simple text file Search tool, which does not parse the language syntax of the original code. When we only want to search for the "function" name, it is possible to find out the original code that contains the name in the note.

For the disadvantage of grep, the programmer who intends to read other code can consider using a tool like gtags. Gtags is the GNU global tagging system of the source code, it not only searches the text level, but also because it has a variety of language parser, so in search, you can only for language-related elements, such as category name, function name and so on.

Moreover, it can index the content of the original code, which means that once the index is built, each search action does not need to re-read the contents of all the original code and search one by one. It is only necessary to be able to find critical passages efficiently based on the existing index structure.

Gtags provides a command-line-based program that lets you specify the directory where the original code resides to perform an indexed action. It also provides a program that allows you to search and retrieve the index structure as if you were doing grep as usual. It provides a number of useful ways to search, such as identifying the number of files and definitions in a project that define a data structure, or identifying all the files in the project that reference a data structure, along with the line numbers at the reference.

In this way, you can easily retrieve the requirements for reading the code. Gtags Such a tool is simply a lot more powerful than grep's support, as it is able to provide.

With htags HTML files, it is even more powerful
There is also an absolute need to mention the tool. This tool, called Htags, can help you make a finished index structure into a set of mutually referenced HTML files. Basically, using such HTML files to read code is more structured than simply reading the original code directly. The reason is that when you read the code, such an HTML file has been created for you to jump between the various source code file fragments of the link. For example, figure one is for a well-known open source project FFmpeg, which is part of the home page of the HTML file produced by Gtags.


The Htags tool first finds the files for all the defined main () functions, and lists the functions in which they are located. Finding the main () function is often the first step in reading the code, because the primary () function is the main entry point of the program, and all actions are initiated, and it is the source of all things.
With the HTML file made by Htags, you can easily click the hyperlink and go directly to the code snippet where the main () function is located, as shown in Figure Ii.



When we look at the above source code, we find that Av_register_all () is a strange, unable to understand things, and want to understand what it is, you can continue to click on this function, as shown in Figure three. It's so convenient. Reading this, you will suddenly find that gtags as if to read the code and specifically designed to create a weapon.




Read someone else's code (4)-words too literally, and then review the role of the component

First, the architecture of the system, and then through the name and naming conventions, you can infer the role of each component. For example, when AOL's Winamp attempts to initialize a plug-in, it calls the initialization function in the structure so that each plugin has a chance to initialize itself. An exit function is called when AOL's Winamp intends to end itself or to end a plug-in execution.

Before reading the details of the code, we should try to capture the operating situation of the system first. When taking the top-down approach, the systemic architecture is the topmost level, and the operating context of the system is at the other level below it.

Good documentation is hard to find, and the ability to piece together a story is important.
Some systems provide good documentation and may also use UML to adequately describe the operational context of the system. Then for the reader, from the system analysis and design documents, is a quick way to understand the system operation situation.
However, not every software project is accompanied by good system files, and many of the most valuable open source code projects, and often do not have such files. For this, the reader must try to capture itself and properly record the captured operating situation.

I like to compare the operating situation of the system to the storyline that the system will perform. Before reading the details of the nature of the code, it is necessary to know what the system will happen to those stories, is the basic lesson. You can use a familiar or self-invented presentation tool to describe the situation you have found. You can even use simple lists to list them directly. As long as you can achieve the purpose of the record, for code reading, can provide help. Alternatively, you can make a more detailed description using representations such as class diagrams, collaboration diagrams, and sequential graphs based on UML.
When you are able to list the situations that your system may have, you have a general understanding of the functions that your system has and how you react in various situations. On the basis of this, you can drill into the details of any need.

The first step in exploring the architecture--find the portal to the program
Previously, in a development project, we had to put the MP3 audio files from the system on the ipod's most popular playback device.

Although the ipod itself can also be used as a portable storage device, but not simply put MP3 playback files in the ipod, you can let the Apple player recognize this file, even can play.
This is because Apple uses a special file structure (the itunes database) to record music, playlists, and music information (such as album name, length of song, singers, etc.) that are available for playback in the player. To understand and try to reuse the existing code, we found an AOL winamp ipod plug-in.

AOL's Winamp is an extremely popular software player on a PC, and the plug-in we find allows the app to display the song information directly from the ipod connected to the computer and allow the software to play directly.

We track and read this plug-in ideas and steps are as follows, first of all, we must first understand the plug-in system architecture. Obviously, after browsing through the source code, we noticed that it follows the specification of AOL's Winamp for plug-ins, which means that it is a DLL on Windows, And through a function called Winampgetmedialibraryplugin DLL, a structure named Winampmedialibraryplugin is provided.
When we do not know what the structure of the system is, we try to explore, and the first step is to find the entrance to the program. How to find it. This will vary depending on the nature of the program.
For a program that is self-executing, we will find the main function of the launcher, for example, the C + + + is the main (), and for Java, the static invalid main (). After you find the portal, follow through and explore the architecture of the system.
But sometimes, the code that we want to read is a class library or library, which is simply used to provide multiple categories or functions for use by the user-side program (client program), which has no single entry, and that the code has multiple portals--each function or category that allows the user to call, is its possible entry.

For example, for AOL's Winamp ipod plug-in, it is a library of dynamic-link libraries, so when we want to understand its architecture, we must first find out what it provides externally, and for Windows DLLs, the externally available functions, will be modified by the Dllexport keyword. So, whether it's a tool like grep press or gtags, we can quickly find a function of only one DLL from the source code (which is really good news for us), and this function is the winampgetmedialibraryplugin. <

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.