[Memory] virtual address space distribution

Source: Internet
Author: User

First, the outset

stepping into the embedded software industry is nearly 2 years, starting from the beginning of Mengmengdongdong Learning C language, since the undergraduate in this area to understand less, so learning more difficult, but there is a group of selfless dedication of the small partners, slowly, slowly, slowly, a foot into the embedded door.

read a lot of this about C language learning, I am also a layman, the first is the use of the well-known "God book"-Tan Lou of the book. These books are nothing more than to talk about the C reason so-called grammar, specifications, and so on, plus the PC-side package can not be encapsulated in the compilation environment, copy a copy of the book C Code, a fool-like point of the compilation button, Wow, unexpectedly appeared "Hello World", so magical ah, I think almost everyone's mind will have the same problem: how to get out of it? Or I think so, eh. Later found that only know how to write code or not, because all the code is based on a certain platform implementation, the code resides in memory, so also need to know some about the compilation of links and so on the intermediate process and generate the final executable file when the memory distribution, The most important thing is to understand the most basic memory of the things, understand the future will let you write code in the future you have a general distribution framework, the benefits greatly. In the learning process was fortunate to find two good books for your reference: "In-depth understanding of computer systems","self-cultivation of programmers", not to fight advertising, I did not receive the author's money Ah, haha . and two male god's blog:clover_toeic ,liutao_1977, feel oneself low burst, hey, when can catch up with it . All right, crap, don't say much, on hard food .


third, Linux virtual address space distribution

In a multitasking operating system, each process runs in its own memory sandbox. This sandbox is the virtual address space, which is a 4GB memory address block in 32-bit mode. In a Linux system, the kernel process and user processes account for 1:3 of the virtual memory, while the Windows system is 2:2 (by setting the Large-address-aware executables flag can also be 1:3). This does not mean that the kernel uses that much physical memory, only that it can dominate this part of the address space and map it to physical memory as needed.

The virtual address is mapped to physical memory through a page table, and the page table is maintained by the operating system and referenced by the processor. Kernel space has a higher privileged level in the page table, so user-configured attempts to access these pages cause a page error (page fault). In Linux, kernel space persists and is mapped to the same physical memory in all processes. Kernel code and data are always addressable, ready to handle interrupts and system calls. In contrast, the mapping of the user-mode address space varies with process switching.
The standard memory segment layout of the Linux process in virtual memory is as follows: (refer to Chapter 10th, "Self-cultivation of programmers")


where the blue stripe in the user address space corresponds to a different memory segment mapped to physical memory, and the gray area represents the unmapped part. These segments are simply memory address ranges and are not related to segments of Intel processors.
Random values, such as in the random stack offset and random mmap offset, are intended to prevent malicious programs. Linux disrupts the layout by adding random offsets to stacks, memory-mapped segments, and heap start addresses so that malicious programs can calculate access stacks, library functions, and so on. Execve (2) is responsible for mapping the process code snippets and data segments, and actually reading the contents of the Code snippets and data segments into memory is done on demand by the system's fault handler for the pages. In addition, EXECVE (2) also Chingqing the BSS to zero.

The user process section stores the contents as shown in the following table (descending order by address):

Name

Store content

Stack

Local variables, function parameters, return addresses, etc.

Heap

Dynamically allocated memory

BSS segment

Global and static local variables with uninitialized or initial value of 0

Data segment

global variable and static local variable initialized with initial value not 0

Code snippet

Executable code, string literals, read-only variables

When an application is loaded into memory space execution, the operating system is responsible for loading code snippets, data segments, and BSS segments, and allocating space for those segments in memory. Stacks are also assigned and managed by the operating system, and the heap is managed by the programmer itself, which explicitly applies and frees space.
BSS segments, data segments, and code snippets are fragments that are compiled by executable programs and require stacks and heaps at run time.

The meanings of each segment are described in detail below.
1) Kernel space
The kernel always resides in memory and is part of the operating system. Kernel space is reserved for the kernel, which does not allow the application to read or write the contents of the zone or directly invoke functions defined by the kernel code.
2) stack (stack)
Stacks , also called stacks, are automatically allocated by the compiler to release, behaving like stacks in data structures (advanced back-out).

There are three main uses of the stack:
The return address of the ① function (to be returned from the callee) and the parameters

② Temporary variables: include non-static local variables for functions and other temporary variables generated automatically by the compiler

③ Save the context: include registers that need to remain unchanged before and after a function call

The stacking order is: parameters (c language parameters from right to left in reverse, Pascal language positive order), return address, old EBP, saved registers, local variables, other data

ARM CPU is not established, the ARM system defaults to the first two parameters through R0,R1, two registers to pass, from the third start to use the stack pass. Some students say that Intel is not necessarily, yes, some compilers support the use of registers to pass parameters, the function requires special keyword declaration. Well, I reckon a lot of people are not clear about declarations and definitions.

And the return value? NO, it is returned by the register. The returned results are stored in Intel Eax,arm is the R0 register.

For a clear representation, a map of the distribution:


The continuous reuse of stack space helps to keep the active stack memory in the CPU cache for faster access. Each thread in the process has its own stack. When the data is constantly pushed into the stack, exceeding its capacity will drain the memory area of the stack, triggering a page fault. At this point, if the stack size is lower than the stack maximum rlimit_stack (usually 8M), the stack will grow dynamically and the program continues to run. The mapped stack expands to the desired size and no longer shrinks.
The ulimit-s command in Linux can view and set the maximum stack value, and when the program uses a stack that exceeds this value, a stack overflow (stack Overflow) occurs, and the program receives a segment error (segmentation Fault). Note that increasing the stack capacity may increase memory overhead and startup time.
The stack can grow either downward (to low memory addresses) or upward, depending on the implementation. The stack that is described in this article grows downward.
The size of the stack is dynamically adjusted by the kernel at run time.

the specific details of the use of stacks in C-language function calls can be consulted: C-language function call stacks
3) Memory-mapped segment (MMAP)
Here, the kernel maps the contents of the hard disk file directly to memory, and any application can request this mapping through the Linux mmap () system call or Windows CreateFileMapping ()/mapviewoffile (). Memory mapping is a convenient and efficient way of file I/O and is used to load dynamic shared libraries. The user can also create an anonymous memory map that does not have a corresponding file that can be used to hold program data. In Linux, if you request a chunk of memory through malloc (), the C runtime creates an anonymous memory map instead of using heap memory. "Chunk" means that it is larger than the threshold Mmap_threshold, which defaults to 128KB and can be adjusted by mallopt ().
This area is used to map the dynamic-link libraries used by the executable file. In the Linux 2.4 release, if the executable file relies on shared libraries, the system allocates space for those dynamic libraries at the address starting from 0x40000000 and loads it into the space when the program loads. In the Linux 2.6 kernel, the starting address of the shared library is moved upwards to a location closer to the stack.
From the layout of the process address space, you can see that, in the case of a shared library, there are two more available space left for the heap: one from. BSS to 0x40000000, about 1GB in space, and the other from shared libraries to stacks, about 2GB. The size of these two pieces of space depends on the stack, the size and number of shared libraries. In this way, is it possible for the application to apply a maximum heap space of only 2GB? In fact, this is related to the Linux kernel version. In the classic layout of the process address space given above, the load address of the shared library is 0x40000000, which is actually the case before the Linux kernel 2.6 release. In version 2.6, the load address of the shared library has been moved near the stack, which is near 0xBFxxxxxx, so the heap range will not be divided into 2 "fragments" by the shared library, so kernel 2.6 of the 32-bit Linux system, The maximum memory theoretical value for malloc applications is around 2.9GB (the test value can also reach 2.9G).
4) Heaps (heap)
The heap is used to store memory segments that are dynamically allocated while the process is running, dynamically expanding or shrinking. The contents of the heap are anonymous, cannot be accessed directly by name, and can only be accessed indirectly through pointers. When a process calls a function such as malloc (c)/new (C + +) to allocate memory, the newly allocated memory is dynamically added to the heap (expansion), and the freed memory is removed from the heap (reduced) when the memory is freed by a function called free (c)/delete (c + +).
The allocated heap memory is a byte-aligned space for atomic manipulation. The heap manager manages each requested memory through a linked list, which eventually results in memory fragmentation because the heap request and release are unordered. Heap memory is typically freed by application allocations, and reclaimed memory is available for reuse. If the programmer does not release, the operating system may be automatically recycled at the end of the program.

We use the user heap, each process has one, each thread in the process requests the memory from this heap, this heap in the user space. The so-called internal training consumption of light, is generally the user heap application is not memory, the application is not divided into two cases, one is your malloc than the total amount of the remaining, this is certainly not to you. The second is the rest, but it is not continuous, the largest piece is not the size of your malloc, will not give you. How did this happen, such as P1 = malloc, p2 = malloc (3), p3 = malloc (+), free (p2). Good to understand, there is fragmentation, if you continue to free (P1), p1 and P2 will merge, free (p3) are all merged together. So if your program is bits memory, is the server, account for the memory is very big, bad character, long and not handsome, when long will appear this situation, the solution, directly apply for a large memory, their own management.
In addition to a common sense problem, unless the special design, generally you apply for the first memory address is an even address, that is, you apply to the heap a byte, the heap will give you at least 4 bytes or 8 bytes, no he because of good management, another fast, to give an extreme example, such as ARM's CPU can not access the odd address, the hardware , it accesses the odd address of the method is divided two times from the adjacent address of the data out to you to spell a singular address of the data, interested to see, the ARM programming manual to understand.

The end of the heap is identified by the break pointer, and when the heap manager needs more memory, the break pointers can be moved by system calls BRK () and SBRK () to expand the heap, which is generally called automatically by the system.
There are two common problems when using heaps: 1) Releasing or overwriting memory that is still in use ("Memory Corruption"), and 2) not releasing memory that is no longer in use ("Memory leak"). When the number of releases is less than the number of requests, a memory leak may have occurred. The leaked memory tends to be larger than the data structure that you forget to release because the allocated memory is usually rounded to a power of 2 of the next greater than the number of applications (for example, 212B is rounded to 256B).
Note that the heap differs from the "heap" in the data structure and behaves like a linked list.

The difference between "extended reading" stacks and heaps ① management: Stacks are automatically managed by the compiler; The heap is controlled by the programmer, easy to use, but prone to memory leaks. ② Growth direction: the stack to the low address extension (i.e. "grow downward"), is a contiguous area of memory, the heap to the high address extension (i.e. "grow Up"), is a discontinuous memory area. This is due to the fact that the system uses a linked list to store the free memory address, which is naturally discontinuous, and the list is traversed from a low address to a high address. ③ Space Size: The maximum capacity of the stack top address and stack is predetermined by the system (usually default 2M or 10M), the heap size is limited by the valid virtual memory in the computer system, and the heap memory in the 32-bit Linux system can reach 2.9G space. ④ Store Content: When the stack is in a function call, it first presses the address of the next instruction in the keynote function (the executable statement of the function call statement), then the function argument, and then the local variable of the function being tuned. At the end of this call, the local variable is first out of the stack, then the parameter, and the last stack pointer points to the address of the first saved instruction, and the program continues to run the next executable statement from that point. The heap usually stores its size in the head with one byte, and the heap is used to store data that is not related to the lifetime of the function call, as specified by the programmer. ⑤ allocation method: The stack can be statically allocated or dynamically allocated. Static allocations are done by the compiler, such as the allocation of local variables. The dynamic allocation is used by the Alloca function on the stack to apply for space and automatically released after it is exhausted. The heap can only be dynamically allocated and released manually
⑥ allocation efficiency: The stack is supported by the bottom of the computer: allocates a dedicated register storage stack address, and the stack stack is executed by a dedicated instruction, so it is more efficient. The heap is provided by a library of functions that are complex and much less efficient than stacks.                                                                                        VirtualAlloc in the Windows system can allocate a piece of memory directly in the process address space, fast and flexible.                                                                 ⑦ after allocation system response: As long as the stack remaining space is larger than the requested space, the system will provide memory for the program, otherwise report the exception prompt stack overflow. The operating system maintains a linked list that records the free memory address for the heap. When the system receives a memory allocation request from the program, it iterates through the list to find the heap node where the first space is larger than the requested space, and then removes the node from the list of free nodes and assigns that node space to the program. If there is not enough space (possibly due to too much memory fragmentation), it is possible to invoke the system function to increase the memory space of the program data segment so that there is a chance to get enough memory and then return.                                , most systems record the allocated memory size at the first address of the memory space for subsequent deallocation functions (such as free/delete) to properly free up memory space.                                                       Additionally, because the heap node size found does not necessarily equal the size of the request, the system automatically re-places the extra parts into the idle list. ⑧ fragmentation problem: Stack does not have fragmentation problems, because the stack is advanced out of the queue before the memory block pops up the stack, on top of which the last stack content has popped up.                                                                                                                                   Frequent application release operations can result in a discontinuity in the heap memory space, resulting in significant fragmentation and reduced program efficiency. As a result, the heap is prone to memory fragmentation and is inefficient due to the lack of specialized system support, and the cost of memory requests is more expensive due to possible user-state and kernel-state switching. So the stack is the most widely used in the program, and the function call is done using the stack, the parameters in the call process, the return address, the stack-based pointer, and theLocal variables, etc. are stored in the same way as stacks.             Therefore, it is recommended to use stacks whenever possible, only when allocating large or large chunks of memory space.                                                        Use stacks and heaps to avoid cross-border occurrences, or the program may crash or break the program heap, stack structure, and have unintended consequences.
5) BSS segment
The following symbols are typically stored in a BSS (Block Started by Symbol) segment:
Uninitialized global variables and static local variables
global variable with initial value of 0 and static local variable (dependent on compiler Implementation)
A symbol that is undefined and has an initial value of not 0 (the initial value is the size of the common block)
In the C language, static allocation variables that are not explicitly initialized are initialized to 0 (arithmetic type) or null pointers (pointer types). Because the BSS is zeroed by the operating system when the program is loaded, the global variables that are not assigned an initial value or have an initial value of 0 are in BSS. The BSS segment is reserved for uninitialized static allocation variables and does not occupy space in the destination file, which reduces the target file volume. However, when the program is running, it is necessary to allocate memory space for the variable, so the target file must record the sum of all uninitialized static allocation variables (written to machine code via START_BSS and END_BSS addresses). When the loader (loader) loader is loaded, the memory allocated for the BSS segment is initialized to 0. In embedded software, before entering the main () function, the BSS segment is mapped by the C runtime system to memory that is initialized to full zero (high efficiency).
Note that although both are placed in the BSS segment, the global variable with the initial value of 0 is strongly signed, and the uninitialized global variable is a weak symbol. If a strong symbol with the same name is defined elsewhere (the initial value may not be 0), the weak symbol will not cause a redefinition error when linked to it, but the initial value of the runtime may not be expected (it will be overwritten with strong symbols). Therefore, when defining a global variable, if only this file is used, try to use the Static keyword decoration, otherwise you need to assign an initial value to the global variable definition (even if it is 0), so that the variable is strongly signed so that the variable name conflict is found in the link, instead of being overwritten by an unknown value.
Some compilers save uninitialized global variables in the common segment, which is then placed in the BSS segment when linked. The-fno-common option can be used during the compile phase to disallow the placement of uninitialized global variables into common segments.
In addition, since the target file does not contain BSS segments, the content of the BSS segment address space is unknown after the program is burned into memory (Flash). U-boot startup process, the U-boot Stage2 code (usually located in the Lib_xxxx/board.c file) moved (copied) to the SDRAM space must be artificially added to the 0 BSS section of the code, and not rely on the STAGE2 code in the variable definition when the 0 value is assigned.
The word "extended reading" BSS History     BSS (block Started by symbol, blocks starting with symbols) was originally a pseudo-instruction in the UA-SAP assembler (Aircraft Symbolic Assembly program), Used to reserve a piece of memory space for the symbol. The assembler was developed by United Airlines in the mid 1950s for the IBM 704 mainframe.     The term was later introduced as a keyword to the standard assembler FAP (Fortran Assembly Program) on the IBM 709 and 7090/94 models, which defines the symbol and reserves the uninitialized space block of the specified number of words for the symbol. In an     architecture that uses segment memory management, such as an Intel 80x86 system, a BSS segment usually refers to an area of memory that is used to hold a global variable that is not initialized in the program, and that the variable has no value but the name and size. When the program starts, it is zeroed by the system initialization. The     BSS segment does not contain data, and only the start and end addresses are maintained so that the memory can be effectively zeroed at run time. The runtime space required for BSS is recorded by the target file, but BSS does not occupy the actual space within the target file, which is not present in the binary image file of the BSS segment application.

6) Data segment
Data segments are typically used to hold global variables and static local variables that have been initialized in the program and that have an initial value of not 0. Data segments are static memory allocations (static stores), readable and writable.
Data segments are saved in the target file (generally cured in the embedded system in the image file), the contents of which are initialized by the program. For example, for a global variable int gVar = 10, you must save 10 of this data in the destination file data segment and then copy it to the appropriate memory when the program loads.
The difference between a data segment and a BSS segment is as follows:
1) The BSS segment does not occupy the physical file size, but occupies the memory space, the data segment occupies the physical file, also occupies the memory space.
For large arrays such as int ar0[10000] = {1, 2, 3, ...} and int ar1[10000],ar1 is placed in the BSS segment, only a total of 10000 * 4 bytes need to be initialized to 0, instead of recording each data 1, 2, 3 ... as in ar0, the disk space saved by BSS as the target file is considerable.
2) When the program reads data from the data segment, the system will leave a page fault to allocate the corresponding physical memory, and when the program reads the data from the BSS segment, the kernel will take it to a full 0 pages without a fault or allocate the corresponding physical memory.
The entire section of the run-time data segment and BSS segment is often referred to as the data area. In some materials, "data segment" refers to the data segment + BSS segment + heap.
7) Code snippet (text)
Code snippets are also called body segments or text segments, which are typically used to store program execution code (that is, machine instructions executed by the CPU). General C-Language execution statements are compiled into machine code saved in the code snippet. Typically, code snippets are shareable, so frequently executed programs only need to have one copy in memory. Code snippets are usually read-only to prevent other programs from accidentally modifying their instructions (writing to that segment will result in a segment error). Some schemas also allow code snippets to be writable, which allows the program to be modified.
Code snippet directives are executed sequentially according to the programming process, and only once (per process) for sequential instructions, and if there is repetition, a jump instruction is used, and a stack is required for recursion.
The code snippet directive includes the opcode and the operand (or object address reference). If the operand is an immediate number (a specific value), it will be included directly in the code, and if the local data, the space will be allocated to the stack, then the data address is referenced, and the data address is referenced in the BSS segment and data segment.
Code snippets are most vulnerable to optimization measures.
8 Reserved Area
is located at the lowest part of the virtual address space and is not assigned a physical address. Any reference to it is illegal and is used to catch exceptions that refer to memory using a null pointer and a small integer value pointer.
It is not a single area of memory, but a generic term for address areas that are protected by the operating system in the address space and that prohibit user process access. In most operating systems, very small addresses are usually not allowed, such as null. The C language assigns an invalid pointer to 0, which is also because of the fact that 0 addresses do not normally hold valid accessible data.
In a 32-bit X86 architected Linux system, user process executable programs typically start loading from the virtual address space 0x08048000. The load address is determined by the Elf file header, which overrides the linker default configuration by customizing the linker script to modify the load address. 0x08048000 The following address spaces are typically occupied by C dynamic-link libraries, dynamic loader ld.so, and kernel Vdso (virtual shared libraries provided by the kernel). By using the MMAP system call, you can access the address space below 0x08048000.

Benefits of the "extended reading" segment during the process of     running, code directives execute sequentially according to the process, with only one visit (of course, jumps and recursion can make the code execute multiple times), and data (data segments and BSS segments) often need to be accessed multiple times, thus opening up space for easy access and space saving. This is explained as follows:     when the program is loaded, the data and instructions are mapped to two virtual storage areas respectively. The data area is read-write to the process, while the instruction area is readable for the process. Permissions for the two zones can be set to both read and write and read-only, respectively. To prevent program instructions from being intentionally or unintentionally rewritten.     modern CPUs have a very powerful cache system, and programs must maximize cache hit ratios. The separation of instruction area and data area is advantageous to improve the locality of program. Modern CPU General data cache and instruction cache separation, so the program's instructions and data stored separately to improve the CPU cache hit ratio.     When multiple copies of the program are running on the system, the instructions are the same, so only one part of the program's instructions is stored in memory. If hundreds of processes are running in the system, sharing instructions will save a lot of space (especially for dynamically linked systems). Other read-only data such as icons, pictures, text and other resources in the program can also be shared. Each replica process has a different data region, which is private to the process. In     addition, temporary data and code that needs to be reused are placed in the stack at run time, with a short life cycle. Global data and static data may need to be accessed throughout the execution of the program, so separate storage management. The heap area is freely allocated by the user for management.
Iv. Summary

I do not produce knowledge, I am just Nature's porter. The memory described above can be found in the two books "in-depth understanding of computer systems" and "self-cultivation of programmers".

Next, we intend to write two blogs on the "production process of executables" and "Loading process of executable files", so please look forward to

A lot of knowledge is from the two male Gods blog: clover_toeic, liutao_1977





[Memory] virtual address space distribution

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.