Turn: Http://www.infoq.com/cn/news/2014/04/learn-open-source Some suggestions for learning open source projects
Author Ben Linders released on April 11, 2014 | Note: GTLC Global Technology Leadership Summit, 500+CTO technical leadership redefined! 1 Discussions
- Share to: Weibo facebooktwitter Youdao Cloud Note email sharing
Read later
My list of reading
With the development of open source communities and projects, more and more developers are learning, applying and contributing to open source projects. In recent years, industry experts who have studied multiple open source projects, including Kvm/qemu, Libvirt, OpenStack, Ceph, Zabbix, and so on, have shared some of the suggestions for Learning open source projects in their blogs and are worthy of the reader's reference.
Zhangyu that the study of open source projects can be divided into five levels, namely:
- Understand basic concepts, basic uses, logical structure, basic principles, background, application scenarios, etc. This level of basic positioning is actually "science". If you only need some basic understanding of a project, and do not need to get started with practical technical work in the short term, then you can learn this level to deal with it first.
- Master the basic installation process and use method of the project. This level of basic positioning is "getting Started" in order to gain an intuitive understanding of the project and gain hands-on experience with its installation and use. If you just need to use this project in a as-is way, you can start by learning this level.
- Understand the organization of the Code, find the corresponding relationship between the main logic/function modules and the code files, through the code Analysis walk through a few key, representative execution process. This level of basic positioning is "in-depth", began to understand the actual implementation of the project, can really be the function of the project, working principle and code to achieve the corresponding, to obtain a visual understanding of the project work process. This level is the real start of learning the code for open source projects. This level of understanding of the project's own code can be helpful if you want to apply development based on this project, or if you work on other projects that are closely related to the project.
- Understand the role of all code modules and program files in the project, and walk through all the main execution processes. This level of basic positioning is "mastery", can be more comprehensive, systematic understanding of the design and implementation of the project, and familiar with the various parts of the project code. If you want to modify the project in depth, or contribute to the community, you should aim at reaching that level.
- Delve into and comprehend the various design ideas and code implementation details of the project. This level of basic positioning is "proficient", excellence, learning. This is the realm that the great gods seek. If you want to be an important contributor to the project community and even a core contributor, this level should be the goal of the effort.
To learn the open source project, we must master some basic knowledge, Zhangyu points out three points:
- Background knowledge of the technical areas involved in the project. For example, to analyze Linux Kenrel, you should understand the principles of the operating system; Learning OpenStack, you should know what cloud computing is. If you do not have these background knowledge as the basis, come up to die the source code, can only be inefficient.
- This project is developed using the language and various development debugging tools.
- English. Unfortunately, most of the most popular open source projects so far have not originated domestically. Therefore, in addition to learning individual extremely popular, well-documented projects, we still need to collect their own reading English information reference. It is important to learn English well.
With the goal of learning and the basic knowledge, the next step is to learn the thinking and process, Zhangyu summed up a set of Youbiaojili, gradually in-depth learning methods.
When we first approached a project, what we saw was actually a black box. According to the documentation, we will certainly find a number of external interfaces on the box. Typically, these interfaces can be divided into three categories:
- Configuration interface: Used to configure the box's operating mode, basic parameters, extensions, and other important features. These configurations are often fitted at once before the box is started. These configurations do not change during the work of the box, or only in a few cases.
- Control interface: Used to manipulate some important behaviors during the work of the box. This is the channel where the box's administrator Controls command injection and state information reading.
- Data interface: Used for the box to read external data during the operation and output data outward after the internal processing is completed. This is the data path that the user of the box really cares about.
Therefore, in the analysis of the code of an open source project, you can focus on the important configuration, control, data interface to expand the analysis work, especially should pay attention to understand a key interface behind the hidden operation flow. For example, for the data interface, at least should walk through a complete data input and output process, that is, in the code to find the data from the input interface into the box, after a variety of processing, forwarding steps, and ultimately from the output interface is transferred out of the entire execution process. Once you have walked through such a process, you can effectively advance the deep understanding of the project by linking the main modules, major steps associated with data processing, and mapping the abstract concepts on the logical module diagram and the documentation to the code implementation.
In the process of practicing this idea, I suggest that we can choose one or two important people from the control interface and data interface to perform detailed analysis of the execution process behind them, and try to find the function call and data transfer relationship of each step (for some systems, the underlying functions provided by the application library can skip to save time). After the completion of this work, the third level of learning objectives can be achieved preliminarily.
The configuration interfaces differ in the degree of importance of different projects. For projects with very flexible architectures and large configuration space (such as OpenStack's Ceilometer), it can be appropriate to spend more time studying them or simply understand them.
The author takes "OpenStack Cinder" as an example of how to learn an open source project:
- First, to analyze cinder, be sure to understand a number of relevant fundamentals. What is cloud computing? What is block storage? What is OpenStack? What is the role of cinder in OpenStack? Wait, wait. If there is no concept of these things, then the follow-up study is very difficult to carry on.
- on this basis, if there is a condition, it is better to be able to deploy and actually operate the cinder (including the other OpenStack components necessary) in order to gain an intuitive understanding and experience of cinder and provide some references for subsequent analysis. This assumes that the backend used by Cinder is ceph, and that the virtual machine running on OpenStack is KVM.
- You should then conceptually have an understanding of the logical framework of the system that we want to analyze. In the overall context, we should understand the structure of the respective logic modules of Horizon and Nova, and how they work together with cinder. This part is closely related to the control interface and the execution path analysis of cinder. In addition, you should understand the interrelationship between cinder and Kvm/qemu and Ceph. This is very helpful for a real understanding of cinder. From the cinder itself, we should understand its internal logic module structure, its respective functions, the control of each other, the connection between data and so on.
- After you have completed the above preparations, you can begin to analyze the code for cinder. As mentioned earlier, it should be considered that each of the control interfaces and data interfaces choose one or two key, Representative analysis. As for the configuration interface, it is assumed that it implements a configuration, and it does not take too much time to temporarily. The core function of cinder is the volume management on OpenStack. At least in the Cinder+ceph scenario, the cinder itself is not on the data transfer critical path. Therefore, the analysis of the control interface is the most serious cinder source code analysis. For the start phase, there are two interfaces and their corresponding execution processes that can serve as the starting point for cinder analysis, the volume create and attach operations. If you can completely get through the execution of both operations (at least to see the level of cinder interacting with Ceph through LIBRBD), it is helpful to really understand the functionality and implementation of cinder. Although KVM-based virtual machines do not pass through cinder when they are created via QEMU, the volume provided by Ceph, that is, this part of the source code is in fact out of the scope of the cinder learning, but if you want to really understand cinder, cinder, This part of the knowledge should be covered, at least there should be conceptual understanding.
In addition, the author also provides some suggestions, such as good notes, not overly entangled in details, complete content can check Zhangyu's blog. Infoq's readers are learning about open source projects and are welcome to express their views.
Go: Some suggestions for learning open source projects (INFOQ)