Memory Management-Program Distribution in memory

Source: Internet
Author: User

Every process in a multitasking operating system runs in its own memory sandbox. This sandbox is virtual address space ).

1 32-bit virtual memory Layout

In 32-Bit mode, the virtual address space is always a 4 GB memory address block. These virtual addresses are mapped to the physical memory through the page table, which is maintained by the operating system and referenced by the processor. Each process has a set of its own page tables, but there is a hidden situation. As long as the virtual address is used, it will act on all the software running on this machine, including the kernel itself. Therefore, some virtual addresses must be reserved for the kernel:

Figure 1

This does not mean that the kernel uses so much physical memory. It only means that it can have such a large address space and map it to the physical memory according to the needs of the kernel. Kernel space has a high privilege level (Ring 2 or below) in the page table. Therefore, as long as user-state programs attempt to access these pages, a page error (page fault) occurs ), the user program cannot access the kernel page. In Linux, the kernel space persists and is mapped to the same physical memory in all processes. Kernel code and data are always addressable and ready to handle interruptions and system calls at any time. In contrast, the ing of address space in user mode changes with the process switching:

Figure 2

In Figure 2, the blue area indicates the virtual address mapped to the physical memory, while the white area indicates the unmapped part. In the above example, Firefox uses a considerable amount of virtual address space because it is a legendary memory-consuming user. Each band in the address space corresponds to different memory segments, such as heap and stack. Remember, these segments are just simple memory address ranges and have nothing to do with Intel processor segments.

1.1 32-bit classic memory Layout

Figure 3

In the 32-bit classic memory layout, the first 1 GB address of the program is the kernel space, followed by the stack space that grows down and the MMAP address that increases from 0x40000000. The heap address starts from the bottom, removes the addresses after elf, code segment, data segment, and constant segment, and increases upwards. But there are several problems with this layout: first, it is prone to overflow attacks; second, there is less than 1 GB of heap address space with wood? If the MMAP memory is relatively small, it is a waste of addresses? So then there was another memory layout.

1.2 32-bit default memory Layout

Figure 4

When the computer is happy, safe, cute, and running properly, the starting virtual address of each segment of almost every process is exactly the same as Figure 4, this also opens the door for Remote Detection of program security vulnerabilities. An absolute memory address, such as the stack address and library function address, must be referenced during the mining process. Remote attackers must rely on the consistency of the address space layout to find and choose these addresses. If you make them guess right, someone will be screwed up. As a result, the random layout of address space is becoming increasingly popular. Linux disrupt the layout by adding a random offset to the stack, memory ing segment, and heap start address. Unfortunately, the 32-bit address space is quite compact, leaving little space for randomization, compromising the effect of this technique.

Stack

The top segment in the process address space is the stack, which is used by most programming languages to store local variables and function parameters. Calling a method or function will push a new stack frame into the stack. Stack shards are cleared when the function returns. It may be because the data strictly follows the LIFO sequence. This simple design means that you do not need to use a complex data structure to track the stack content. You only need a simple pointer to the top of the stack. Therefore, the pushing and popping processes are fast and accurate. In addition, the continuous reuse of stack space helps to keep active stack memory in the CPU cache, thus accelerating access. Every thread in a process has its own stack.

By constantly pushing data into the stack, the memory area corresponding to the stack will be exhausted if the data exceeds its capacity. This will trigger a page fault (page fault) and be processed by Linux's expand_stack (). It will call acct_stack_growth () to check whether there is a suitable place for Stack growth. If the stack size is lower than rlimit_stack (usually 8 Mb), the stack will be extended under normal circumstances, and the program continues to run happily, so it cannot feel what has happened. This is a general mechanism for extending stacks to the desired size. However, if the maximum stack space is reached, the stack overflow will occur, and the program will receive a segment error (segmentation fault ). When the mapped stack area is extended to the desired size, it will not contract back, even if the stack is not so full. This is like the federal budget, which is always growing.

Dynamic stack growth is the only situation where access to areas not mapped to memory (white areas in the figure) is allowed. Any other access to areas not mapped to the memory will trigger a page failure, resulting in a segment error. Some mapped areas are read-only, so attempting to write these areas will also cause segment errors.

Memory ing segment

Below the stack is our memory ing segment. Here, the kernel maps the file content directly to the memory. Any application can use the MMAP () System Call (implementation) of Linux or the createfilemapping ()/mapviewoffile () request of windows for this ing. Memory ing is a convenient and efficient file I/O method, so it is used to load dynamic libraries. It is also possible to create an anonymous memory ing that does not correspond to any files. This method is used to store program data. In Linux, If you request a large block of memory through malloc (), the C Runtime will create such an anonymous ing instead of using heap memory. The 'block' indicates that it is larger than mmap_threshold. The default value is kb and can be adjusted by mallopt.

Heap

Speaking of heap, it is the next address space. Like the stack, the heap is used for memory allocation during runtime, but the difference is that the heap is used to store data whose lifetime is irrelevant to function calls. Most languages provide the heap management function. Therefore, meeting the Memory Request becomes a common task of the Language Runtime Library and kernel. In C language, the heap allocation interface is a series of malloc () functions. In a language with the garbage collection function (such as C #), this interface is a new keyword.

If the heap has enough space to meet memory requests, it can be processed by the Language Runtime Library without kernel involvement. Otherwise, the heap will be extended and the memory block required for the request will be allocated through the BRK () System Call (implementation. Heap management is complicated and requires fine algorithms to cope with the messy allocation mode in our programs, optimizing the speed and memory usage efficiency. The time required to process a heap request will change significantly. The real-time system solves this problem through a special purpose distributor. The heap may also become fragmented, as shown in:

Figure 5

BSS Data Segment code segment

Finally, let's take a look at the bottom of the memory segment: BSS, data segment, code segment. In C, both BSS and data segments store static (global) variables. The difference is that BSS stores uninitialized static variable content, and their values are not directly set in the source code of the program. The BSS memory area is anonymous: it is not mapped to any file. If you write static int cntactiveusers, the content of cntactiveusers will be saved in BSS.

On the other hand, the data segment is stored in the static variable content initialized in the source code. This memory area is not anonymous. It maps some program binary images, that is, static variables with initial values specified in the source code. Therefore, if you write static int cntworkerbees = 10, the content of cntworkerbees is saved in the Data Segment and the initial value is 10. Although the data field maps to a file, it is a private memory ing, which means that the memory here will not affect the file to be mapped. This is also required. Otherwise, assigning a value to the global variable will change the binary image on your hard disk, which is unimaginable.

The example of the Data Segment in is more complex because it uses a pointer. In this case, the value of the pointer Gonzo (4-byte memory address) is stored in the data segment. The actual string it points to is not here. This string is saved in the code segment. The code segment is read-only. It saves all your code and other fragmented things, such as the string literal value. The code segment maps your binary files to the memory, but write operations in this area will make your program receive a segment error. This helps prevent pointer errors, although it is not as effective as precautions when programming in C language. The following sections and variables are displayed:

Figure 6

You can check the memory area in a Linux Process by reading the/proc/pid_of_process/maps file. Remember that a segment may contain many areas. For example, each memory ing file has its own region in the MMAP segment, and the dynamic library has additional regions similar to BSS and data segments. The next article describes the true meaning of these areas. Sometimes people refer to "data segments", which refer to all data segments + BSS + heap.

2 64-bit virtual memory Layout

The addressing space of the 64-bit system is large, so the classic 32-bit layout is still used, but the random MMAP start address is added to prevent overflow attacks. At half past one, the memory address will not be used, so at least N years will not change.

First of all, most operating systems and applications do not need such a huge address space as 16eb (264). Implementing a 64-bit long address only increases the complexity of the system and the cost of address conversion, does not bring any benefits. so the current x86-64 architecture CPU all follow AMD's canonical form, that is, only the minimum virtual address of 48 bits will be used in address translation, and the 48-63-bit of any virtual address must be the same as the 47-bit sign extension ). that is to say, the total virtual address space is 256 Tb (248 ).


Figure 7

Then, in the 0000000000000000 TB virtual memory space, 00007-128 fffffffffff (128 TB) is the user space, and ffff800000000000-ffffffffffffffffff (TB) is the kernel space. note that there are many holes in the kernel space. After the first hole exists, ffff880000000000-ffffc7ffffffffff (64 TB) is the area directly mapped to the physical memory, that is to say, the default page_offset is ffff880000000000. from here we can also see that such a large direct ing area is enough to map all the physical memory, so there is no high-end memory in the current x86-64 architecture, that is, zone_highmem (refer to the previous article ).

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.