Program Running Mechanism from Hello World

Source: Internet
Author: User


Learn any courseProgramming LanguageWill start from Hello world. For a language that has never been used before, we can use this language to write its Hello world in a short time.

However, for the simplicity of Hello world Program I believe there are still many people who are not very clear about the internal operating mechanism. Hello world, how does this information pass through the display? CPU executed Code It must be different from the code we write in the program. What does it look like? How can we change the code we write to the code that the CPU can execute? What is the time code for running programs? How are they organized? Where are the variables stored in the program? How is function calling available? This article Article Briefly Discuss the program running mechanism

Hidden process of development platform

Every language has its own development platform, and most of our programs are born here. Slave Program Source code The conversion process to executable files is actually divided into many steps and complicated, but now the development platform handles all these tasks on its own, while bringing us convenience, she also hides a lot of implementation details. Therefore, most programmers are only responsible for writing code, and other complex conversion tasks are silently completed by the development platform. In my understanding, the process from source code to executable files can be divided into the following stages:
1. From the source code to the machine language, organize the generated machine language according to certain rules. We are now referred to as file. 2. Link file a and file B (such as library functions) required to run file a + 3. Load file a + into the memory, run the file (in fact, it may take more than these steps to read reference books or other materials, but here I want to simplify it into three steps) to form the key steps of the executable file, indispensable. Now we can see that it is "blinded" by the development platform. The following sections will open the fog and show you the true colors of the development platform.

Target File

There is a classic saying in the computer field: "Any problem in computer science can be sloved by another layer of indirecition" "any problem in the computer science field can be solved by adding an intermediate layer." For example, the conversion from A to B should be implemented., you can first convert file a to file a +, and then convert file a + to file B we need. (In fact, this method is also described in polia's "How to slove it. You can simplify the problem by adding an intermediate layer when solving the problem.) The process from source code to executable files can be understood in this way. The same is true for source code and executable files. The problem can be solved by adding an intermediate layer between them. As mentioned above, first convert the source program to the intermediate file a and then the intermediate file to the target file we need. This approach is used to process files. In fact, the document a mentioned above is more professional: the target file. She is not an executable program and needs to be linked to other target files and loaded before execution. For a source program, the development platform must first translate the source program into a machine language. Compilation is an important part. I believe many people know that the source code is translated into machine language (in fact, it is a bunch of binary code ). Compilation knowledge is very important, but not the focus of this Article. If you are interested, You can Google it on your own. Target file format: Now let's take a look at how the target file is organized (that is, the storage structure ). Origin: Imagine how to organize the binary code if you design it? Just as the items on the desk need to be classified and placed in a neat manner, in order to facilitate the management of the translated binary code, the binary code is also classified and stored, and the codes are put together, indicating that the data is put together. In this way, binary code is stored for different blocks. Such a region is something called a segment. Standard: Like many things in computer science, it is designed to facilitate communication and program compatibility. The standard is also set for the binary storage method, so coff (Common Object File Format) was born. Currently, the target file formats in windows, Linux, and other mainstream operating systems are similar to those in coff, and can all be considered as its variants. A. Out: A. out is the default name of the target file. That is to say, when you compile a file, if you do not rename the target file after compilation, a file named A. out will be generated after compilation. The specific reason why this name is used is not discussed here. If you are interested, You can Google it yourself. The following figure gives you a more intuitive understanding of the target file:

It is a typical structure of the target file. The actual situation may be different, but it is derived based on this.

ELF File Header:That is, the first segment in. The header contains some basic information about the target file. For example, the version of the file, the target machine model, and the program entry address.

Text Section : The data in it is mainly the code part of the program. Data Segment : The data section in the program, such as a variable. Relocation segment : The relocation segment includes text relocation and data relocation, including the relocation information. Generally, external functions or variables are referenced in the code. Since it is a reference, these functions and variables do not exist in the target file. When using them, give their actual address (this process occurs when the link is used ). These relocation tables provide information for finding these actual addresses. After understanding the above, it is not difficult to understand text migration and data migration. Symbol table: The symbol table contains all the symbol information in the source code. Including each variable name and function name. The information of each symbol is recorded. For example, if the Code contains the "student" symbol, the corresponding information of this symbol is included in the symbol table. Including the segment where the symbol is located, its attributes (read and write permissions), and other information. In fact, the initial source of the symbol table can be said to be in the lexical analysis phase of compilation. During lexical analysis, every symbol and its attributes in the Code are recorded in the symbol table. String table: Similar to the symbol table, it stores some string information. Another point is: The target files are all stored in binary format, and they are binary files. In reality, the target file is more complex than this model, but the idea is the same, that is, to store by type, add some segments that describe the information of the target file and the required information in the link.

A. Out partitioning

Hello world is empty. Now let's take a look at the target file generated after hello World compilation. Here we use C to describe it. Simple hellow world source code:/* hello. C */# include <stdio. h> int main () {int A = 5; printf ("Hellow world \ n");} to include data in the data segment, "int A = 5" is added here ". If it is on VC, click run to see the result. We use GCC to compile the program to see how it is handled internally. Run GCC hello. C and check our directory. The target file a. Out is added.

What we want to do now is to see what is in A. Out. Maybe some children's shoes can be viewed in VIM text. At that time, I thought so naively. But how can a. out be exposed in such a simple way. Yes, VIM does not. "Most of the problems we encountered are problems that our predecessors have already encountered and solved", right, there is a very powerful tool called objdump. With it, we can thoroughly understand the various details of the target file. Of course, readelf is also very useful, which will be introduced later. These two tools are usually included in Linux. You can Google them by yourself. Note: The code here is mainly compiled with GCC in Linux, and the target files are objdump and readelf. But I will get all the running results, so it's okay to see the following content if I have never touched Linux before. I am using ubuntu. It feels good ~ The following is the organizational structure of A. Out: (the starting address, size, and so on of each segment) the command for viewing the target file is objdump-h a. Out.

Similar to the format of the target file described above, it can be seen that it is classified storage. The target file is divided into six segments.

From left to right, the first column (idx name) is the segment name, the second column (size) is the size, VMA is the virtual address, and LMA is the physical address, file off is the offset in the file. That is, the distance between this section and a reference in the Section (generally the start of the section. The final algn is a description of the segment attribute.

Text section: code segment.

"Data" segment: The data segment mentioned above, which stores the data in the source code, usually initialized data.

"BSS" segment: it is also a data segment that stores uninitialized data. Because the data has not been allocated space, it is stored separately.

"Rodata" segment: read-only data segment. The data stored in the segment is read-only.

The "cment" stores the compiler version information.

The remaining two paragraphs have no practical significance for our discussion, so we will not introduce them any more. I think they contain some links, compilation, and installation information.


The format of the target file here is only the main part of the actual situation. Some tables are not listed. If you are using Linux, you can use objdump-X to list more detailed sections.

In-depth A. Out

The above section uses examples to describe the typical segments in the target file, mainly the segment information, such as the size and other related attributes.

So what exactly are there in these segments? What is stored in the "text" section, or our objdump.

Objdump-s a. Out you can use the-s option to view the hexadecimal format of the target file.

The result is as follows:

As shown in, The hexadecimal representation of each segment is listed. We can see that the graph is divided into two columns. The column on the left is in hexadecimal notation, and the corresponding information is displayed on the right. Obviously, "Hello World" exists in the "rodata" read-only data segment ".. Khan, it seems that the "hello" in the program is wrong, and a "W" is added later, which is troublesome ,. Forgive me. You can also view the ASCII Value of "Hellow world". The hexadecimal value is the content. The Section mentioned in "comment" contains the version information of the compiler. The Section is followed by the GCC compiler, followed by the version number. A . Out Disassembly In the compilation process, the source file is first converted into an Assembly form and then translated into a machine language. (Add the middle layer) read so many a. Out, and then study its Assembly form to hate the necessary objdump-d a. Out to list the file assembly form. However, only the main part is listed here, that is, the main function part. In fact, there is still more work to be done at the beginning of the main function execution and after the main function execution. Initialize the function execution environment and release the space occupied by the function.

In the above figure, the left side is the hexadecimal form of the code, and the left side is the Assembly form. Most of the shoes that are familiar with assembly should be able to understand. I will not describe them here.

A. Out header file When introducing the target file format, we mentioned the header file concept, which contains some basic information about the target file. For example, the version of the file, the target machine model, and the program entry address. Is the form of a file header: You can use readelf-h to view it. (Hello. O, which is a file compiled but not linked to the source file hello. C. This is the same as viewing a. Out)


The figure is divided into two columns. The left column indicates the attribute, and the right column indicates the attribute value. The first line is often called magic number. There are a series of numbers, and the specific meaning is not much to say. You can go to Google on your own. Next we will discuss some information related to the target file. Since it has little to do with the issues we want to discuss, we will not discuss them here. The above content uses a specific example to describe the internal structure of the target file. The target file is only an intermediate process in the process of generating executable files. We have not discussed how the program runs, how the target file is converted to an executable file and how the executable file is executed will be discussed in the following section

Simple understanding of links

In general, the link refers to several executable files. If function a references the function defined in file B, in order that the function in a can be executed normally, you need to put the function part in B in the source code of, the process of merging a and B into a file is linked. A dedicated process is used to link a program, called a linker. It processes some input target files and then combines them into an output file. These target files often contain mutual data and function references. In the above article, we have read the disassembly form of Hello world, which is a file that has not been linked yet. That is to say, when referencing an external function, we do not know its address: for example:

In, the Cal command calls the printf () function, because the printf () function is not in this file at this time, so it cannot determine its address, in hexadecimal notation, "FF" is used to represent its address. After the connection, the address will change to the actual address of the function. It should be the function that has been loaded into the file after the connection. Link classification: links can be divided into static links and dynamic links by merging data or functions related to a file. Static link: the link is completed before the program is executed. That is, the file can be executed only after the link is complete. However, this has an obvious drawback, such as library functions. If both file a and file B need to use a library function, after the link is complete, all files connected to the file contain this library function. When both A and B are executed, there will be two copies of the library function in the memory, which will undoubtedly waste storage space. This waste is especially obvious when the scale is expanded. Static links also have disadvantages such as being difficult to upgrade. To solve these problems, many programs now use dynamic links. Dynamic Link: Unlike static links, dynamic links are linked only when the program is executed. That is, when the program is loaded and executed. In the above example, if both A and B use the library functions fun (), A and B only need a copy of fun () in the memory during execution. There is still a lot of knowledge about links, and we will talk about them using special articles in the future. I will not discuss it here.

Simple Explanation of loading 

We know that to run a program, it is necessary to load the program into the memory. In the past, the entire program was loaded into the physical memory. Now, virtual storage is generally used, that is, each process has a complete address space, it seems that every process can use the completed memory. Then, a memory manager maps the virtual address to the actual physical memory address. According to the preceding descriptions, the program addresses can be divided into virtual addresses and actual addresses. The virtual address is the address in her virtual memory space, and the physical address is the actual address she loads.

You may have noticed that the file is not linked or loaded, so the virtual address and physical address of each segment are both 0. the loading process can be understood as follows: First assign virtual addresses to each part of the program, and then create a ing between virtual addresses and physical addresses. In fact, the key part is the ing process from virtual addresses to physical addresses. After the program is installed, the CPU program counter PC points to the starting position of the Code in the file, and then the program is executed in order.


The purpose of this article is to sort out the program running mechanism and hide what is behind the execution of an executable file. From source code to executable files, there are usually many intermediate steps, and each intermediate step generates an intermediate file. These steps are hidden in the current integrated development environment. When we are used to the integrated development environment, we gradually ignore these important technical secrets. This article only introduces the main line of this process. Every detail in this article can be described in an article. The above is my personal understanding and opinions. If you have any problems, I hope you can give me some advice.

See: Computer Systems Windows Kernel Analysis, programmer self-cultivation

If there is reprint please indicate the source:

A fish ~ @ Blog Park Yin Yaling


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.