Linux memory bit-by-bit user process memory space

Source: Internet
Author: User
The top command is often used to understand process information, including memory information. The command top help document explains each field in this way. VIRT, Virt

The top command is often used to understand process information, including memory information. The command top help document explains each field in this way.
VIRT, Virtual Image (kb)
RES, Resident size (kb)
SHR, Shared Mem size (kb)
% MEM, Memory usage (kb)
SWAP, Swapped size (kb)
CODE, Code size (kb)
DATA, Data + Stack size (kb)
NFLT, Page Fault count
Mcm t, Dirty Pages count
Despite comments, I still feel a little obscure. I don't know what it means?

Process memory space

A running program is called a process. Each process has its own, independent, and non-interfering memory space. This space is divided into several segments, namely Text, Data, BSS, Heap, and Stack. The user's process memory space is also the VM (virtual memory) allocated to the process by the system kernel, but does not indicate that the process occupies so much RAM (physical memory ). How big is the space? The VIRT value output by the command top tells us the memory size of each process (the process memory size increases or decreases as the program is executed ). You can also use/proc/maps or pmap-d to understand the distribution of memory space of a process, for example:

# Cat/proc/1449/maps... 0012e000-002a4000 r-xp 00000000 08:07 3539877/lib/i386-linux-gnu/libc-2.13.so 002a4000-002a6000 r -- p 00176000 08:07 3539877/lib/i386-linux-gnu/libc-2.13.so 002a6000-002a7000 rw-p 00178000 08:07 3539877/lib/i386-linux-gnu/libc-2.13.so incubator rw-p 00000000 0... 08048000-0875b000 r-xp 00000000 4072287/usr/local/mysql/libexec/mysqld0875b000-0875d000 r -- p 00712000 07 4072287/usr/local/mysql/libexec/mysqld0875d000-087aa000 rw-p 00714000 07 4072287/ usr/local/mysql/libexec/mysqld... PS: linear address, access permission, offset, device number, inode, ing file

VM allocation and release

"The memory is always occupied by the process." In other words, we can understand that the process always requires memory. When fork () or exec () is a process, the system kernel allocates a certain number of VMS to the process, as the memory space of the process, the size of the BSS segment, the defined global variables, static variables, the number of characters in the Text segment, the memory image of the program, and the local variables of the Stack segment are also determined. Of course, you can also use functions such as malloc () to dynamically allocate memory and expand heap upwards.

The biggest difference between dynamic allocation and static allocation is: 1. it is not until Run-Time that dynamic allocation is executed, but when compile-time is used, the number of Text + Data + BSS + stacks to be allocated has been determined. 2. for memory dynamically allocated through malloc (), the programmer needs to manually call free () to release the memory. otherwise, memory leakage may occur, the static memory is released after the execution of the process (Text, Data). However, the Data in the Stack segment is short and the function exit is destroyed immediately.

Let's use several Sample applets to deepen our understanding.

/* @ Filename: example-2.c */# include
 
  
Int main (int argc, char * argv []) {char arr [] = "hello world";/* Stack segment, rw --- */char * p = "hello world";/* Text segment, string direct quantity, r-x -- */arr [1] = 'l '; * (++ p) = 'l';/* error. The Text segment cannot write */return 0;} PS: variable p, which is in the Stack segment, but it refers to "hello world", which is a string directly measured and placed in the Text segment. /* @ Filename: example_2_2.c */# include
  
   
# Include
   
    
# Include
    
     
Char * get_str_1 () {char str [] = "hello world"; return str;} char * get_str_2 () {char * str = "hello world"; return str ;} char * get_str_3 () {char tmp [] = "hello world"; char * str; str = (char *) malloc (12 * sizeof (char); memcpy (str, tmp, 12); return str;} int main (int argc, char * argv []) {char * str_1 = get_str_1 (); // error, the data in the Stack segment is destroyed when the function exits. the char * str_2 = get_str_2 (); // correct. the number of characters pointing to the Text segment is directly counted. only after exiting the program Char * str_3 = get_str_3 (); // correct, pointing to the data in the Heap segment, not free () printf ("% sn", str_1 ); printf ("% sn", str_2); printf ("% sn", str_3); if (str_3! = NULL) {free (str_3); str_3 = NULL;} return 0;} PS: The function get_str_1 () returns the Stack Segment Data. an error is reported during compilation. If you do not need the data in Heap, release free () as soon as possible (). # Include
     
      
# Include
      
        # Include
       
         # Include
        
          Char data_var = '1'; char * mem_killer () {char * p; p = (char *) malloc (1024*1024*4); memset (p ,'', 1024*1024*4); p = & data_var; // dangerous, memory leak return p;} int main (int argc, char * argv []) {char * p; for (;) {p = mem_killer (); // The memory allocated by malloc () in the function cannot be free () printf ("% cn", * p ); sleep (20);} return 0;} PS: use malloc (). pay special attention to manual free () as early as possible when the memory in the heap segment is not used (). The VIRT and RES values output by top are used to observe the VM and RAM occupied by the process.
        
       
      
     
    
   
  
 

This section describes the tool size before the end. Because the Text, BSS, and Data segments determine the number of VMS occupied by the process during compilation. This information can be known through size.

# Gcc example_2_3.c-o example_2_3
# Size example_2_3
Text data bss dec hex filename
1403 272 8 1683 693 example_2_3

Malloc ()

During programming, the coders often need to process changed data and cannot predict whether the changes to the datasets to be processed are large (phper may be hard to understand). In addition to variables, they also need to dynamically allocate memory. The GNU libc Library provides two memory allocation functions: malloc () and calloc (). Call the malloc (size_t size) function to allocate memory successfully. The size byte VM is always allocated (again not RAM), and a starting address pointing to the allocated memory area is returned. The allocated memory will be retained for the process until you call free () to release it explicitly (of course, the whole process ends, and the static and dynamic allocated memory will be recycled by the system ). Developers are responsible for releasing the dynamically allocated memory back to the system as soon as possible. Remember one sentence: free () as soon as possible ()!

Let's take a look at the small example of malloc.

/* @ Filename: example_2_4.c */# include
 
  
# Include
  
   
Int main (int argc, char * argv []) {char * p_4kb, * p_128kb, * p_300kb; if (p_4kb = malloc (4*1024 ))! = NULL) {free (p_4kb);} if (p_128kb = malloc (128*1024 ))! = NULL) {free (p_128kb);} if (p_300kb = malloc (300*1024 ))! = NULL) {free (p_300kb);} return 0 ;}# gcc example_2_4.c-o example_2_4 # strace-t./example_2_4... 00:02:53 brk (0) = 0x8f5800000: 02: 53 brk (0x8f7a000) = 0x8f7a00000: 02: 53 brk (0x8f79000) = 0x8f7900000: 02: 53 mmap2 (NULL, 311296, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS,-1, 0) = 0xb772d00000: 02: 53 munmap (0xb772d000, 311296) = 0... PS: the system calls brk (0) to obtain the address of the current heap, also known as a breakpoint.
  
 

By tracking system kernel calls, it can be seen that the glibc function malloc () always meets the memory allocation requirements through brk () or mmap () system calls. The malloc () function selects brk (), mmap (), and 128Kbytes is the critical value based on memory requirements of different sizes. The small memory (<= 128 kbytes) calls brk (), which pushes the highest address of the data segment to a higher height (the heap increases from the bottom up ). For large memory, mmap () is used for anonymous ing (set the flag MAP_ANONYMOUS) to allocate memory, which is unrelated to the heap and is out of the heap. It makes sense to do this. imagine: if a large block of memory is also called brk (), it will be easily pinned by small blocks of memory, and it will not be very frequent to use large blocks of memory. in turn, memory allocation for small blocks is much more frequent. if mmap () is also used, frequent memory ing creation will lead to more overhead. Another point is that, the memory ing size must be a multiple of "page" (unit: Memory page size, default value: 4Kbytes or 8 Kbytes, it would be a waste to map a page of memory for small data just like "hello world.

Like malloc (), releasing the memory function free () will also use brk () to push the breakpoint back to the lower part based on the memory size, or choose to call munmap () to cancel the ING. Note that brk () is not called immediately every time free () small memory is called, that is, the heap will not be reduced after each memory is released, instead, it will be retained by glibc for the next malloc () usage (small memory size must be allocated more frequently) until glibc finds that the idle size of the heap is significantly greater than the amount required for memory allocation, brk () is called (). However, every time the free () block of memory is used, munmap () is called to unbind the ING. Below are two examples of malloc () small memory and large memory.


: The malloc (100000) function, less than 128 kbytes, is pushed to a height (heap-> ). Note the purple circle


: Malloc (1024*1024), greater than 128 kbytes, between heap and stack. Pay attention to the purple circle. PS: The Data Segment in the figure refers to BSS, Data, and Heap. Some documents indicate that the Data segment has three sub-areas: BSS, Data, and Heap.

Error Page)

Every time you call malloc (), the system only assigns a linear address (VM) to the process and does not immediately assign a page (RAM ). The system tries its best to postpone the assignment page frame to the last moment-when the page is used, the page is missing for exception handling. One of the biggest advantages of this page frame's on-demand latency allocation policy: fully and effectively leverage the system's scarce resource RAM.

When the memory page referenced by the pointer does not reside in RAM, that is, the corresponding page box cannot be found in RAM, a page missing exception (transparent to the process) will occur ), the kernel is prone to page missing exception handling. Page missing exceptions may occur in the following situations: 1. only linear addresses are allocated and no page box is allocated. This occurs when a memory page is accessed for the first time. 2. a page box has been allocated, but the page box is recycled and swapped out to the disk (swap zone ). 3. the referenced memory page, outside the process space, does not belong to the process and may have been free (). We use a piece of pseudo code to get a general idea of page missing exceptions.

/* @ Filename: example_2_5.c */... Demo () {char * p; // 100Kbytes linear address allocated if (p = malloc (1024*100 ))! = NULL) // L0 {* p = 't'; // L1... // After a long period of time, no matter whether the system is busy or not, page boxes that are not used for a long time may be recycled * p = 'M '; // L2 p [4096] = 'p'; // L3... Free (p); // L4 if (p = NULL) {* p = 'l'; // L5 }}}...
  • L0. the malloc () function assigns a 100Kbytes linear address area (VM) to the process through brk (). However, the system does not immediately assign a page box (RAM ). In this case, the process does not occupy Kbytes of physical memory. This also shows the reason why the VIRT value increases frequently while you use top, but the RES value remains unchanged.
  • L1. The first page (4 Kbytes) of 100Kbytes is referenced through * p ). Because this is the first time you reference this page, the corresponding page cannot be found in RAM. A page missing exception occurs (for a process, the page missing exception is transparent). The system captures this exception sensitively and enters the page missing exception handling phase. Next, the system will assign a page box (RAM) map to it. We call this situation (the Accessed page is not placed in any page box, the kernel allocates a new page box and initializes it appropriately to meet the call request), also known as Demand Paging.
  • L2, after a long time, use * p to reference the first page of 100Kbytes again. If the system cannot find the ing page in RAM (it may be switched to disk ). Page missing exception occurs and is caught by the system to handle page missing exceptions. Next, the system will allocate a page (RAM), find the "page" of the backup on the disk, and replace it with the memory (because the swap operation is expensive, therefore, instead of always one page, the page is pre-converted into multiple pages. This also indicates that some documents said: "When vmstat occurs, many si does not mean that the physical memory is insufficient "). This will force the process to sleep (probably because of the time it takes to populate the current disk with data in the page box (RAM ), blocking the current process of page missing exception handling is called as the main page missing (major falut), also known as the large page missing (see ). On the contrary, it does not block the process's missing pages, which are called "minor fault" or "small missing area.
  • L3 references the second page of 100Kbytes. See Demand Paging on the first page of 100Kbytes.
  • L4, released memory: the linear address area is deleted, and the page box is also released.
  • L5, again referencing the memory page through * p, has been free () (the user process itself does not know ). Page missing exception occurs. the page missing exception handler checks that the page missing is not in the process memory space. The system will kill this process and report the well-known segment error (Segmentation fault) when dealing with page missing exceptions caused by such programming errors ).


For details about how to handle Page faults, see Page Fault Handling.

Pagination PFRA

As the number of concurrent network users increases, the number of processes increases (for example, the daemon processes fork () subprocesses to process user requests), and page-missing exceptions become more frequent, more disk data needs to be cached (refer to the OS Page Cache in the next article), and RAM becomes increasingly tight. To ensure that there are enough page boxes to handle page exceptions, Linux has its own practice called PFRA. PFRA always enters the memory path space and page cache from the user state, and the "steal" page box is provided. The so-called "theft" refers to: swap out of the data in the page box occupied by the memory space of the user process to the disk (called the swap zone ), or flush (synchronize fsync () to the disk device. PS: If you observe that the system is as slow as it is due to insufficient RAM, it is usually caused by page missing exception processing and PFRA is on the "page stealing" page ". We understand PFRA from the following aspects.

    Candidate pages: Find out which pages can be recycled?


  • Page boxes occupied by the process memory space, such as pages (Heap, Data) in the Data segment, and anonymous ing pages (such as large memory allocated by malloc () between Heap and Stack ). But does not include pages in the Stack segment.
  • Memory page of the mmap () process space, which has a ing file and non-anonymous ING.
  • The page occupied by Buffer/Cache in the page Cache. Also called OS Page Cache.
    Page box recycling policy: determine the page box to be recycled, and further determine which candidate pages to recycle first


  • Try to recycle the Buffer/Cache in the page Cache first. Next, recycle the page for memory space usage.
  • The page that the process space occupies can be recycled if it is not locked. Therefore, when a process has been sleeping for a long time, the occupied page boxes will gradually be exchanged to the swap zone.
  • The LRU replacement algorithm preferentially recycles unused page boxes. This type of page that is stored in the LRU unused linked list is often considered to be less likely to be referenced next time.
  • The process memory page is much more expensive than Buffer/Cache. Therefore, by default, only when the value of swap_tendency (exchange tendency value) in Linux is not less than 100 will the RES occupied by the process be swapped out. In fact, the exchange trend value describes that the system is busy and RES is occupied by processes. When Buffer/Cache only occupies a little space, the process takes up the page box. PS: This indicates that some DBAs propose to set the vm. swappiness value of the MySQL InnoDB server to 0, so that InnoDB Buffer Pool data stays in RES for a longer time.
  • If there is no page box to recycle, PFRA will take the worst action, kill a user-state process, and release the occupied page boxes. Of course, this Killed process is not randomly selected. at least it should be a process that occupies a large number of pages and has a low running priority level and is not a root user.
    Activate recycle page: When will the recycle page be returned?


  • Emergency Recovery. The system kernel does not have enough page boxes to allocate. when the system kernel processes read files and memory missing pages, the system kernel starts to "recycle pages urgently ". Wake up the pdflush kernel thread and first write the dirty page on page 1024 back to the disk from the page cache. Then start to recycle the 32-page box. if the 32-page box is not collected properly for 13 times, kill a process.
  • Periodic recovery. Before emergency recovery, PFRA will also wake up the kernel thread kswapd. To avoid more "emergency recycling", when the number of idle page boxes is found below the set warning value, the kernel thread kswapd will be awakened and the recycle page box will be. Until the number of idle page boxes reaches the set security value. PS: When RES resources are insufficient, you can run the ps command to see more kswapd threads are awakened.
  • OOM. During peak hours, when RES is highly stressed, the page boxes of kswapd continuous recycling are in short supply until the "emergency recovery" is entered and the OOM is reached.
Paging and Swapping

These two keywords appear in many places and should be translated as Paging and Swapping ). PS: It is a noun, such as building, to add ing to more verbs in English. It is too difficult to chew words. See Figure 2

Most of the time for Swapping is spent on data transmission, and more data is exchanged, which means that the time overhead also increases. The process is transparent. Due to insufficient RAM resources, PFRA will write part of the data in the anonymous page box to the swap area (swap area) and back up it. this action is called so (swap out ). When a page missing exception occurs, the page missing exception handler reads the page in the swap area (disk) back to the physical memory. this action is called si (swap in ). Each Swapping operation may not only contain one page of data, either si or so. Swapping means disk operations, update page tables, and other operations. these operations are costly and will block User-state processes. Therefore, si/so, which is continuously high in memory, means that physical memory resources are performance bottlenecks.

Paging, as we mentioned earlier, Demand Paging. Locate the physical address through the linear address and find the page. This process can be considered as Paging, which is transparent to the process. Paging means generating page missing exceptions, or large page missing, which means wasting more CPU time slice resources.

Summary

1. the user process memory space is divided into five segments: Text, DATA, BSS, Heap, and Stack. Among them, the Text is read-only and executable, and the DATA global variables and static variables are free () as soon as Heap is used up. The DATA in the Stack is temporary, and the function is no longer exited.
2. glibc malloc () dynamically allocates memory. When brk () or mmap () is used, 128Kbytes is a critical value. Avoid memory leakage and avoid wild pointers.
3. the kernel will try to delay Demand Paging. The primary missing pages are expensive.
4. clear the page boxes occupied by Buffer/Cache first, and then use the LRU replacement algorithm in the page boxes occupied by programs. By adjusting the vm. swappiness value, you can reduce Swapping and page size.
5. less Paging and Swapping
6. fork () inherits the address space of the parent process, but it is read-only. it uses the cow technology. the fork () function returns a second

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.