Original title: Anatomy of a program in Memory
Original address: http://duartes.org/gustavo/blog/
[Note: My level is limited, had to pick some foreign masters of the wonderful article translation. One of your own review, and then share with you. ]
The memory management module is the heart of the operating system, which is important for application and system management. In the next few articles, I will focus on the actual memory problems, but also do not shy of the technical insider. Since many concepts are generic, most of the examples are taken from Linux and Windows systems on the 32-bit x86 platform. The first article in this series covers the memory layout of the application.
Each process in a multitasking operating system runs in a memory sandbox that belongs to its own. This sandbox is the virtual address space , which is always a 4GB memory address block in 32-bit mode. These virtual addresses are mapped to physical memory through a page table , and the page table is maintained by the operating system and referenced by the processor. Each process has a set of its own page tables, but there is a hidden one. As long as the virtual address is enabled, it will be used for all software running on this machine, including the kernel itself. Therefore, a subset of the virtual addresses must be reserved for use by the kernel:
This does not mean that the kernel uses so much physical memory, only that it can dominate such a large address space, which can be mapped to physical memory according to the needs of the kernel. Kernel space has a higher privileged level (ring 2 or less) in the page table, so as long as the user-configured program tries to access these pages, it causes a page error (page fault). In Linux, kernel space persists and is mapped to the same physical memory in all processes. Kernel code and data are always addressable and ready to handle interrupts and system calls. In contrast, the mapping of the user-mode address space varies with process switching:
The blue area represents the virtual address that is mapped to physical memory, and the white area represents the unmapped portion. In the example above, Firefox uses quite a lot of virtual address space because it is legendary for eating large memory. Each stripe in the address space corresponds to a different memory segment (segment), such as heap, stack, and so on. Remember that these segments are simply memory address ranges and are not related to segments of Intel processors. Anyway, here is the standard memory segment layout for a Linux process:
When the computer is happy, safe, cute, normal operation, almost every segment of the process of the beginning of the virtual address is exactly the same , which also gives the remote excavation program security loopholes open the door. An excavation process often requires reference to the absolute memory address: The stack address, the library function address, and so on. Remote attackers must rely on the consistency of address space layout and grope to select these addresses. If you let them guess, someone will be the whole. Therefore, the random arrangement of address space is becoming popular. Linux disrupts the layout by adding a random offset to the stack, the memory-mapped segment, and the start address of the heap. Unfortunately, the 32-bit address space is quite compact, leaving little room for randomization, weakening the effect of this technique.
The topmost segment of the process address space is the stack, which is used by most programming languages to store local variables and function parameters. Calling a method or function pushes a new stack frame into the stack. The stack frame is cleaned up when the function returns. Perhaps because the data is strictly in the order of LIFO, this simple design means that you do not have to use complex data structures to track the contents of the stack, just a simple pointer to the top of the stack. Therefore, the stack (pushing) and the rewind (popping) process are very fast and accurate. In addition, the continuous reuse of the stack space helps to keep the active stack memory in the CPU cache, thus accelerating access. Each thread in the process has its own stack.
By constantly pressing data into the stack, exceeding its capacity will deplete the memory area of the stack. This will trigger a page failure (page fault) and be processed by Linux Expand_stack (), which will call Acct_stack_growth () to check if there is a suitable place for the stack to grow. If the size of the stack is lower than rlimit_stack (usually 8MB), then the stack will normally be extended, and the program continues to run happily, without feeling what's going on. This is a general mechanism for extending the stack to the desired size. However, if the maximum stack space size is reached, the stack overflows (stack overflow) and the program receives a segment error (segmentation Fault). When the mapped stack area expands to the desired size, it will no longer shrink back, even if the stack is less full. It's like the federal budget, it's always growing.
Dynamic stack growth is the only scenario allowed to access an unmapped memory area (white area in the diagram). Any other access to the unmapped memory area triggers a page failure, which results in a segment error. Some of the mapped areas are read-only, so attempting to write these areas can also result in segment errors.
Below the stack is our memory-mapped segment. Here, the kernel maps the contents of the file directly to memory. Any application can request this mapping through the Linux mmap () system call (Implementation) or Windows createfilemapping ()/MapViewOfFile (). Memory mapping is a convenient and efficient way to file I/O, so it is used to load dynamic libraries. It is also possible to create an anonymous memory map that does not correspond to any file, and this method is used to hold the program's data. In Linux, if you request a chunk of memory through malloc (), the C runtime will create such an anonymous mapping instead of using heap memory. ' Chunk ' means bigger than Mmap_threshold, default is 128KB, can be adjusted by mallopt ().
When it comes to heaps, it's the next piece of address space. As with stacks, the heap is used for run-time memory allocations, but the difference is that the heap is used to store data that has no relation to the lifetime of the function call. Heap management functionality is available in most languages. Therefore, satisfying the memory request becomes the common task of the language runtime library and the kernel. In C, the interface for heap allocation is the malloc () series function, and in languages with garbage collection (such as C #), this interface is the new keyword.
If there is enough space in the heap to satisfy the memory request, it can be processed by the Language runtime library without the need for kernel involvement. Otherwise, the heap is expanded to allocate the block of memory required by the request through the BRK () system call (Implementation). Heap management is complex and requires sophisticated algorithms to cope with the clutter of distribution patterns in our programs, optimizing speed and memory usage efficiency. The amount of time required to process a heap request can vary significantly. The real-time system solves this problem through a special purpose allocator. The heap may also become fragmented, as shown in:
Finally, let's take a look at the bottom of the memory segment: BSS, data segment, code snippet. In the C language, BSS and data segments hold the contents of static (global) variables. The difference is that BSS holds static variable contents that are not initialized, and their values are not set directly in the program's source code. The BSS memory area is anonymous: it does not map to any file. If you write the static int cntactiveusers, the contents of the cntactiveusers are saved in the BSS.
The data segment, on the other hand, holds the contents of the static variables that have been initialized in the source code. This area of memory is not anonymous. It maps a part of the program binary image, which is the static variable that specifies the initial value in the source code. So, if you write static int cntworkerbees = 10, the contents of Cntworkerbees are saved in the data segment, and the initial value is 10. Although a data segment maps a file, it is a private memory map, which means that changing the memory here does not affect the mapped file. It must be so, otherwise assigning a global variable will alter the binary image on your hard drive, which is unthinkable.
The example in the data segment is more complicated because it uses a pointer. In this case, the value of the pointer Gonzo (4-byte memory address) itself is saved in the data segment. and the actual string it points to is not here. This string is stored in the code snippet, and the code snippet is read-only, preserving all of your code plus bits and pieces, such as string literals. The code snippet maps your binaries to memory, but writes to this area will cause your program to receive a segment error. This helps protect against pointer errors, although not as effective as when you are programming in C. Shows the variables in these paragraphs as well as in our example:
You can test the memory area of a Linux process by reading the file/proc/pid_of_process/maps. Remember that a segment may contain many areas. For example, each memory-mapped file has its own region in the Mmap segment, and the dynamic library has additional areas like BSS and data segments. The next article explains the true meaning of these "areas". Sometimes people refer to "Data segment", which refers to the whole data segment + BSS + heap.
You can use the NM and Objdump commands to view binary images, print their symbols, their addresses, paragraphs, and more. Finally, it should be noted that the virtual address layout described in the previous article is a "flexible layout" (flexible layouts) in Linux and has been used as a default for some years. It assumes that we have a value of rlimit_stack. When this is not the case, Linux returns using the classic layout, as shown in:
The layout of the virtual address space is all about it. The next article will discuss how the kernel tracks these memory areas. We will analyze the memory map to see how the read and write operations of the file are associated with it, and what the memory usage profile means
Http://www.cnblogs.com/lancidie/archive/2011/06/26/2090547.html
Anatomy of the program's memory layout