Non-original (the red letter in the text is your own opinion. If there is anything wrong, please kindly advise)
A program consists of commands and data. The same is true for a C program. When writing a program, developers often need to select different data storage methods based on different data characteristics and program requirements. What are the data storage methods in C language?
The C program can be roughly divided into four data zones: constant zone, static de-division, heap zone, and stack zone.
The constant area stores the string constants not used for initialization and the global variables modified by const. It features that they can only be accessed and cannot be written, and the life cycle is the same as the running process of the program.
The static zone stores all global variables and all static modified variables (including global and local). It features a long life cycle (for a program running process) it is initialized only once (completed after compilation ).
The stack zone stores all local variables that are automatically stored (without any storage type keyword modifier or auto modifier). It features a short life cycle, it is only a call process of the function where the variable is located. The operating system is allocated at runtime and recycled after the function is completed.
The heap is a large memory pool maintained by the operating system. You need to manually apply for it (call the malloc family function) when using it, but you need to manually release it after use, otherwise, it will cause serious memory leakage and will not be reclaimed by the operating system until the process exits.
The following describes the features of each type of storage.
I. Constant zone:
Therefore, the constant area stores unchangeable quantities, such as string constants. In the actual ELF (Executable and Linkable Format, the Executable connection Format is developed and released by the UNIX System Lab (USL) as the Application Binary Interface (ABI. Linux, as a unix-like system, still follows the format .) The program data is stored in segments, and the corresponding constant zone is ". rodata (read-only data) segment ".
The data in the constant area is marked as read-only, that is, the program only has access permission and has no write permission. Therefore, if developers need to use data that they do not want to change, they can put it in the constant area.
There are many common types of C language, such:
Character constant: 'A', 'A ','*'.
String constant: "helloworld", "ilovechina", "12345 ".
Integer constant:, 012, 0x0a, 0b00001010.
Floating Point constants: 3.14, 123.456, 3.0E-23;
But not all constants will be placed in the constant area by the compiler, as shown in code 1-1:
Figure 1-1 defines a variable and is initialized by a constant
In the figure, the program defines an integer local variable I and is initialized to 10, where I is a variable and 10 is a constant, but the compiler does not put 10 into the constant area, instead, assign values directly by the number immediately in the Command (Figure 1-2 ).
Figure 1-2 Assembly Code Compiled by Figure 1-1
This is because the compiler considers that ordinary integer, floating-point, or complex constants can be implemented through the immediate number when used, and there is no need to store them in the Data zone, this saves storage space and runtime access time. So what kind of data will be put into the constant area?
1. string constants
As shown in 1-1, a local character pointer Variable p is defined in the C program to point to a String constant, where p is placed in the stack zone because it is a local variable, the String constant "helloworld" is put in the Assembly. the rodata segment (Figure 1-2) will be included in the ELF format file generated after compilation. rodata segment (Figure 1-3)
Figure 1-3 Example program in C Language defines pointer variables pointing to string constants.
Figure 1-4 assembler generated by code in Figure 1-3.
Figure 1-5 executable program analysis generated by program compilation in Figure 1-3.
However, when a character constant string is used for Array initialization, The String constant is not placed in the constant area, but in the corresponding array, as shown in 1-6:
Figure 1-6 define a character array and initialize it with a String constant
The compiler converts the string into a group of 32-bit integers in four bytes to initialize the array, as shown in figure 1-7, the 10-in-decimal integer 13th of the 1819043176 rows is converted to 0x6c6c6568 in hexadecimal notation, which is exactly the characters 'l', 'l', 'E', and 'H ':
Figure 1-7 assembler generated by C code in Figure 1-6
Therefore, the. rodata segment in the ELF format file generated by compilation does not store the String constant:
Figure 1-8 Figure 1-6 fragment of the executable program generated after the program is compiled.
2. Global variables modified by const
A)
In addition to strings, other constants can also be placed in the constant area, provided that the data must be stored in the space of the global variable and modified by the const keyword. 1-9 generations:
Figure 1-9 row 4th defines a global variable modified by const
Comparison of compiled assembler programs:
Figure 1-10
Here, value0 is placed due to const modification. the rodata segment is also called a constant zone, while value1 is a common global variable, so it is placed in. the data segment is also called the static data zone. The executable program of the ELF format generated by analysis compilation is as follows:
Figure 1-11 storage location of value0
The value0 data is placed in the constant area (. rodata segment). The 0a displayed in hexadecimal format corresponds to its initial decimal value 10.
Figure 1-12 storage location of value1
The Value1 data is placed in the static zone (. data Segment). The 14 displayed in hexadecimal format corresponds to its initial decimal value 20.
B)
However, not all variables modified by const are placed in the constant zone. In fact, this is true only for global variables, after being modified by const, a common local variable only means that the variable value cannot be changed explicitly in the expression. Otherwise, the compiler reports a syntax error, but the variable is still stored in the stack. C ++ encourages const to replace # define here because C ++ optimizes const. If the value of this variable is a constant expression, const folding will be performed in C ++. What is const folding and Baidu, simply put, this variable will be replaced with the constant expression value wherever it occurs during compilation. Therefore, in this case, the variable defined by const can be used to define the dimension of the array, C language does not have this optimization feature, so the const variable modified by C language cannot be defined in array dimension no matter how (note, without constant folding in C ++, the const modifier still cannot be defined as an array, when the constant folding please refer to my blog http://www.cnblogs.com/yanqi0124/p/3795019.html ). Because the storage area does not change in nature, you can still change its value in other ways, such as pointers. 1-13:
Figure 1-13 defines two local variables, one of which is modified by const
Save, compile, and the result is as follows:
Figure 1-14 compiler compilation errors
As value1 is modified by const, the value assignment statement of line 1 of the program will be incorrect.
Next, modify the program and use the pointer to modify the value of value1:
Figure 1-15 defines that the pointer p points to value1 and assigns a value through the pointer.
Compile and run:
Figure 1-16 compiling and running results
The defined pointer Variable p does not match the type of the Expression & value1 (p is int *, and the type of & value1 is const int *) therefore, when the value of Row 3 is assigned, the compiler generates a Type Mismatch warning. We ignore this warning and continue running. The result changes the value of value1.
3. segment errors caused by the constant area
Because the feature of the constant area is read-only, when the program tries to write data to the address pointing to the constant area, the operating system is in security consideration and will send a wrong segment signal and kill the process, to protect the operating system.
Figure 1-17 Sample Code: Use a pointer to write data to the constant area.
Both rows 10th and 11 can generate the same segment error, as shown in figure 1-18.
Figure 1-18 segment errors caused by illegal write
Ii. Static zone:
Static zone is an abstract concept. In actual Linux/C executable programs, there is no static zone. Specifically, it consists of two segments :. data Segment and. bss segment. Where. the data segment is the data segment of the program. In the framework of segmented memory management, the data segment (data segment) it is usually a memory area used to store global variables or static variables initialized in the program and not 0. On the contrary, BSS (Block Started by Symbol) usually refers to a memory area used to store uninitialized global variables or static variables initialized to 0 in the program .. The size and data of the data Segment are determined during program compilation. the bss segment does not directly allocate space, but is stored by the compiler. the data segment is reserved for it and is officially allocated when the program is loaded into the memory. Although the static zone consists of two different segments, the two segments are not differentiated after the program is linked and loaded into the memory, so we will not discuss them separately here.
Variables in the static zone have the following features:
1) The lifecycle is long until the process ends and is recycled along with the process space.
2) It is initialized only once. Its space data is initialized during compilation, and the logical address is fixed during the link.
Which variables will be placed in the static zone?
1. Global variables:
As its name implies, it is global. If a variable is defined as global, any function in the same program can access and access the data of the variable. Based on this, in addition to all the features of static zone variables, global variables also have a wide scope, and their scope is globally visible throughout the program (which can be composed of multiple source files.
2. Static variables
Literally, static variables are variable modified by the static keyword. As long as static variables are modified to static variables, they will be allocated by the compiler in the static zone, it also has all the features of static zone variables. There are two types of static variables: Global static variables and local static variables. Either method will be put in the static zone as long as it is modified, and it will have all the features of the static zone variables. The difference is only in scope: if it is a global static variable, the scope of the variable is limited to use only in this source file (after compilation, the variable symbol will not allow external links, but it can still be indirectly accessed through pointers); if it is a local variable, it does not change (only used inside the function ).
The following sample code describes the characteristics of static variables (Figure 2-1 ):
Figure 2-1
The Code defines a global variable gvalue and a local variable lvalue. After two function calls. Gvalue is assigned by the compiler to the static zone because it is a global variable, while lvalue is a local variable in the stack zone. Due to the close-up of the static area, gvalue is accumulated after two function calls, and the local variable lvalue is reinitialized every time the function is called. Address (Figure 2-2 ).
Figure 2-2
Next, change the program to a static local variable (Figure 2-3 ):
Figure 2-3
The local variable lvalue is put in the static zone for initialization only once, so it is also accumulated (Figure 2-4 ).
Figure 2-4
Source:
C language data storage (a): http://www.embedu.org/Column/Column540.htm
Data storage in C Language (2): http://www.embedu.org/Column/Column558.htm