C ++ program memory layout (1)
Liu Xin, School of software, Chongqing University
Abstract:
This article discusses the basic knowledge of C ++ program memory layout, and briefly introduces the concepts of heap, stack, global data zone and code zone, it also introduces the knowledge of memory alignment and process address space (virtual memory.
Early in the morning, I received a C ++ interview question from an external student. The company developed C ++ on the Windows platform. The interview questions have a C ++ program memory layout, which is representative.
The following code is known:
- # Include <iostream>
- # Include <string>
- Using std: string;
- Using std: cout;
- Using std: endl;
- Int global_a = 5; // Global Object
- Static global_ B = 6; // Global static object
- Int main ()
- {
- Int a = 5; // declare a variable 5
- Char B = 'a ';
- Int c = 8;
- Static int d = 7;
- Cout <& a <endl;
- Cout <& c <endl;
- Cout <& d <endl;
- Cout <& global_a <endl;
- Cout <& global_ B <endl;
- Return 0;
- }
The problem is as follows: after the code is run, the variable address (which is obvious) is printed. Take variable a as an example. is the address of the variable printed after the first execution and the second execution the same? If the code is re-compiled and run again, will the printed variable address change? Please explain the reason.
I have asked several students about this question. The answer is mainly in the following two ways: the first one is that the address of variable a printed three times is different, the second method assumes that the address of the first two printed variables a is the same. The address of the third printed variable a is different from that of the first two.
So what is the actual result? Run the following experiments in VC ++ 6.0 and VS2008 respectively:
Figure 1: program running under VC6.0
Figure 2: Run in VC ++ 2008
What does it mean? First, I explained how many of my students have provided the wrong answer. Why? Will the same cause memory conflicts (it seems that the memory address is actually the same ). To solve these two problems, you must understand the memory model of the C ++ program and the memory management mechanism on the Windows platform.
First, let's look at the memory model of the C ++ Program (this article does not discuss the C ++ Object Memory Model). Below is a classic C ++ Memory Model diagram:
Figure 3: C ++ program memory Layout
As shown in the preceding figure, you can establish an overall understanding of the C ++ program running, which is divided into Stack, heap, global data zone, and program code zone. In the first program, variable a is a local variable, so it is allocated in the stack area. Let's recall the stack knowledge. A stack top pointer, the stack grows from a high address to a low address, and the stack size is limited. Well, the question is, why is the address of object a printed three times the same.
First, in VC ++ 6.0, add a breakpoint to the "a = 5" line of statements, press F5j to enter the debugging mode, and then press ALT + 8 to view the assembly code generated by the compiler:
Figure 4: assembly code
Ebp is the top address of the current stack. Note that a occupies three bytes, in addition, the promised a address is actually a low-byte address (recalling the knowledge of the Computer composition structure ). Well, what is the top pointer of the stack before the first data a enters the stack? Calculate the value of 0012FF7C + 4 = 0012FF80, which is determined by the compiler. As shown above, the value of VC ++ 6.0 is different from that of VC ++ 2008. Similarly, you can calculate the starting address of the global data zone.
Okay. Let's take a closer look and discuss another issue, memory alignment. After variable a is declared, character B is declared (occupies one byte), and integer variable c is declared. What is the address of c? Calculate: 0012FF7C-1-4 = 0012FF77. But the actual output is 0012FF74. Why? The answer is memory alignment.
In the modern computer architecture, in order to enable the CPU to access variables efficiently and quickly, the variable address should have some characteristics, that is, "alignment ", alignment is implemented by a compiler. In this example, the starting address of a four-byte integer variable is located at the four-byte boundary, that is, it can be divisible by four. The definition in the Assembly is called the "modulo four address ".
Now let's modify the code at the beginning and add the dynamic object. Note that the dynamic object is allocated in the heap. Add the following lines of code:
- Int * pinteger = new int (5); // allocate memory on the heap
- Cout <pinteger <endl;
- Int * pinteger2 = new int (5 );
- Cout <pinteger2 <endl;
- Delete pinteger;
- Delete pinteger2;
Figure 5: added memory allocation in the heap
Note that the two variable addresses printed at the end are the addresses on the stack, and note that the two addresses are not continuously increasing. The memory address allocated on the stack and the global data zone is continuous. The heap address is located between the stack and the global variable area, which is consistent with the C ++ memory layout diagram shown above.
Well, ask the second question: will there be an address conflict if the variable addresses printed by the three programs are the same? The answer is no address conflict, because the printed Address is not the actual physical address of the variable, but a virtual address, also known as a logical address. Three Programs (in fact three running instances of the same program) are considered as three independent processes in the operating system and each has an independent address space. They do not affect each other. For example, the memory with the same address of 0012FF7C may have different data in different processes (in our example, they are the same ). In Windows, the virtual memory mechanism provides a consistent memory view for each process and separates the logical memory from the physical memory. I will not repeat the knowledge of virtual memory here. I recommend the book "Operating System Concepts" to give a complete introduction to virtual memory.
Appendix:
The following three paragraphs refer to Mr. Li xianjing's blog and give an easy-to-understand introduction to the virtual memory implementation mechanism:
To ensure that the address space of a process is independent, it is difficult to implement it by software alone, and it usually depends on the help of hardware. This kind of hardware is called the Memory Manage Unit (MMU. In this architecture, memory is divided into physical memory and virtual memory. Physical memory is the actual memory, and the physical memory is as large as the memory is installed on the machine. Applications use virtual memory. When Accessing memory data, MMU converts the virtual memory address to the corresponding physical memory address based on the page table.
MMU maps the virtual memory of each process to different physical memory to ensure that the virtual memory address of the process is independent. Because the physical memory is much less than the total virtual memory of each process, the operating system will write the memory data that is not used for the time being to the disk and allocate the physical memory to the process in need. Generally, a dedicated partition is created to store the swap data. This partition is called a swap partition.
The ing from virtual memory to physical memory is not a byte ing, but a minimum unit called page. The page size depends on the specific hardware platform, usually 4 K. When the virtual memory page accessed by the application is not in the physical memory, MMU generates a page-missing interruption and suspends the current process. The page-missing interrupt processing function reads the corresponding data from the disk into the memory, then wake up the suspended process.
Post address: http://blog.csdn.net/liuyimu/article/details/5510374