[Compilation and C-language relations] 3. Storage layout of variables

Source: Internet
Author: User
Tags function prototype

Take the following C program as an example:

#include <stdio.h>Const intA =Ten;intA = -;Static intb = -;intC;intMainvoid){    Static intA = +; CharB[] ="Hello World"; Registerintc = -; printf ("Hello world%d\n", c); return 0; }

We define some variables in the global scope and the local scope of the main function, and introduce some new keyword const, static, register to modify the variables, then how to allocate the storage space of these variables? After compiling, we use the readelf command to look at its symbol table to understand the address distribution of each variable. In the following list, the author rearranges the symbol table by address from low to high, and only captures the lines we care about:

  

Variable A is modified with a const, indicating that a is read-only, non-modifiable, that the address it is assigned to is 0x8048540, and that the output from readelf can be seen in the. Rodata segment:

  

  

Its address in the file is 0x538~0x554, we use the Hexdump command to see the contents of this paragraph:

  

Where the 0x540 address is 0a 00 00 00 is the variable A, we also see the string literal "Hello world%d\n" in the program is assigned at the end of the. Rodata segment, the literal value of the string is read-only, which is equivalent to defining a const array at the global scope:

  

When the program is loaded, the. Rodata segment and. Text fields are typically merged into a single segment, which is protected by the operating system to prevent accidental rewriting of the segment. This can also be seen from the output of readelf:

  

Note that a const variable such as a must be initialized at the time of definition. Because only the initialization has the opportunity to give it a value, once defined can no longer be rewritten, that is, can no longer assign value.

From the output above readelf, you can see that the. Data segment starts with the address 0x804a010, and the length is 0x14, which is the end of the address 0x804a024. There are three variables in the. Data segment, A, B, and a.1589.

A is a global symbol, and B is modified by the static keyword, causing it to become a local symbol, so the function of static here is to declare b this symbol is local, not be processed by the linker, if you link multiple target files together, The local symbol can only be defined and used in one of the target files, but not in one destination file but used in another target file. A function definition can also be modified with static, indicating that the function name symbol is local.

And what's a a.1589? It is the static int A in the main function. The static variable in a function is different from a local variable, and it is not allocated when the function is called, but is statically allocated like a global variable, so use the word "static". On the other hand, the scope of the static variable in the function and the local variable only work in the function, such as the variable name of a in the main function only works in the main function, so the compiler gave its symbol a suffix to the global variable A and other functions of the variable a area.

The. BSS section starts at address 0x804a024 and has a length of 0xc, which is the end of address 0x804a030. The variable c is in this segment. From the readelf output above, you can see that. data and. BSS are merged into a single segment at load time, and the segment is read-write: The BSS segment differs from the. Data segment in that the. BSS segment does not occupy storage space in the file, and this segment is populated with 0 at load time. Therefore, if the global variable is not initialized, the initial value is 0 and is also allocated in the. BSS segment.

Now there are only two variables in the function, B and C, that are not analyzed. function parameters and local variables are allocated on the stack, B is the same as the array, is also allocated on the stack, we look at the main function of the disassembly code:

  

As can be seen, the string "Hello world" for the initialization of B is not assigned to the. Rodata segment, but is written directly in the instruction, and 12 bytes are written to the stack via three MOVL instructions, which is the storage space for B, as shown in:

  

Although the stack is growing from a high address to a low address, the array is always arranged from a low address to a high address, in order from low address to high address (b[0], b[1], b[2] ...

The address of the array element b[n] = The base address of the array (b does the right value represents the base site) + N x the number of bytes per element, and when n=0, the element b[0] is the base of the array, so the array subscript starts at 0 instead of 1. Variable C does not allocate storage space on the stack, but directly exists in the EAX register, and the subsequent call to printf is also directly from the EAX register to remove the C worth parameter stack, which is the role of the Register keyword, instructs the compiler to allocate as much as possible a register to store this variable. When you call printf, the "Hello World%d\n" argument is stacked with the first address in the. Rodata segment, rather than pressing the entire string stack. So the string can be used as the array name, if the right value represents the address of the first element of the array.

We use the global variables and local variables are mainly from the scope of the two concepts, and now it seems to use two concepts to distinguish the variable is too general, need further subdivision. Let's sum up the relevant C syntax:

Scope This concept is used for all identifiers, not just variables, and the scope of the C language is divided into several categories:

    • function scope, the identifier is valid throughout the function. Only statement labels are part of a function scope. The label does not need to be declared before it is used in the function, in front of a goto statement can also jump to a later label, but limited to the same function.
    • The file scope, where the identifier is valid at the end of the program file, starting at its declared location. For example, in the example above the main function of a, a, B, C and main also count, printf is actually declared in the stdio.h is included in the program file, so also calculate the file scope.
    • Block scope, the identifier is in a pair of {} brackets (function body or statement block), valid from the position it declares to the right} parenthesis. For example, A, B, C in the main function of the example above, in addition, the formal parameters in the function definition are scoped, from the position of the Declaration to the end of the function is valid.
    • Function prototype scope, the identifier appears in the function prototype, the function prototype is a declaration rather than a definition (no function body), then the identifier from the location of the declaration to the end of the prototype is valid. such as int foo (int a, int b), and A and B in.

The inner scope overrides the identifier of the outer scope for the duplicate name identifier that belongs to the same namespace. Namespaces can be categorized in the following categories:

    • Statement labels belong to a single namespace. For example, in a function, local variables and statement labels can be the same name, not affect each other. Because the syntax for using labels is different from the syntax for using other identifiers, the compiler does not confuse it with other identifiers.
    • struct, enum, and union type tag belongs to a namespace. Since tag is always preceded by a struct, enum, and Union keyword, the compiler does not confuse it with other identifiers.
    • The member names of the Strcut and union belong to a namespace. The compiler does not confuse TA with other identifiers because the member names are always accessed by the. or-I operator instead of being used alone.
    • All other identifiers, such as variable names, function names, macro definitions, typedef type names, enum members, and so on, all belong to the same namespace, and if there is a duplicate name, the macro definition overrides all other identifiers because it is processed in the preprocessing phase rather than at the compilation stage. In addition to the macro definition, several other types of identifiers are handled according to the rules above, with the inner scope covering the outer scope.

There are three types of link properties for identifiers:

    • External link (External Linkage), if the final executable file is linked by more than one program file, an identifier in any program file, even if it is declared multiple times also represents the same variable or function, then this identifier has External Linkage. Identifiers that have external are compiled in the symbol table and are global symbols. For example, a and c,main and printf outside the main function in the example above are counted.
    • Internal link (Internal Linkage), if an identifier in a program file represents the same variable or function even if it is declared more than once, the identifier has Internal Linkage. For example, the previous example has a B outside the main function. An identifier with internal Linkage is a local symbol in the symbol table after compilation, but that b inside the main function cannot be counted as internal Linkage, because even in the same program file, it is declared multiple times in different functions and does not represent the same variable.
    • No link (no Linkage). Identifiers other than those above are of no linkage, such as local variables of functions and other identifiers that do not represent variables and functions.

The storage class modifier (Storage class specifier) has the following keywords, which can be used to modify a variable or function declaration:

    • Static, the storage space of the variable modified by it is statically allocated, and the variable or function of the file scope with which it is decorated has internal Linkage.
    • Auto, with its modified variables in the function call automatically allocated storage space on the stack, the function is automatically released when the return, such as the main function in the example above is actually modified with auto, but auto can omit do not write, auto cannot decorate the file scope of the variable.
    • Register, the compiler will allocate a special register to store the variables modified by register, but if the register is not allocated, the compiler will treat it as an auto variable, and register cannot decorate the variable of the file scope. Now the general compiler optimization is done very well, it will find a way to effectively use the CPU registers, so now register keyword is also used relatively little.
    • extern, as mentioned above, is a link property is classified according to whether an identifier is declared multiple times by the same variable or function, the extern keyword is used to declare the same identifier multiple times, and the next chapter details its usage.
    • typedef, which is not used to modify variables, but rather to define a type name. A typedef appears in the syntax structure in the same way as a few keywords, as well as a modifier variable, so it is grouped together with the previous keywords from the perspective of syntax rather than semantics.

The Const keyword described above is not a storage class specifier, although it looks like it also modifies a variable declaration, but in the more complex declaration that is introduced later, the position and the storage class that the const is allowed to appear in the syntax structure Specifier is not exactly the same. Const and the Restrict and volatile keywords to be introduced later belong to the same class of syntax elements, called type qualifiers (types Qualifier).

The lifetime of a variable (Storage Duration, or lifetime) is divided into the following categories:

    • Static Storage Duration, which has an external or internal link property, or a static modified variable, is allocated and initialized once when the program begins execution, and then persists until the end of the program. This variable is usually located in a. Rodata,. Data or. BSS segment, such as A, A, B, C, and a in the main function above the main function in the previous example.
    • Automatic lifetimes (Automatic Storage Duration), link properties are unlinked and have no static-modified variables that are allocated on the stack or register when they enter the block scope, and are released when the block scope is exited. For example, B and C in the main function of the previous example.
    • Dynamically allocating lifetimes (allocated Storage Duration) will later talk about calling the malloc function to allocate memory in the process's heap space, and calling the free function to release this storage space.

[Compilation and C-language relations] 3. Storage layout of variables

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.