Basic linux memory knowledge and related tuning Solutions

Source: Internet
Author: User

Basic linux memory knowledge and related tuning Solutions
Memory is one of the important components in the computer and serves as a bridge to communicate with the CPU. All programs in the computer run in the memory, so the memory performance has a great impact on the computer. Memory is used to temporarily store computing data in the CPU and data exchanged with external memory such as the hard disk. As long as the computer is running, the CPU transfers the data to be computed to the memory for computation. After the computation is completed, the CPU sends the result, the memory operation also determines the computer's stable operation. Memory may be the most troublesome device for the entire operating system. However, its performance directly affects the entire operating system.

We know that the CPU cannot deal with the hard disk. It can only be called by the CPU when data is loaded into the memory. When the cpu accesses the memory, it must first be like a memory monitoring program request. The monitoring program controls and allocates memory read/write requests. This monitoring program is called MMU (Memory Management Unit ). The following uses a 32-bit system to describe the memory access process:

In a 32-bit system, when each process accesses the memory, each process is considered as having 4 GB of memory available. This is called virtual memory (address ), virtual Memory is converted to physical memory through MMU. In order to convert a linear address to a physical address, the page table memory space is required, and the page table is loaded to MMU. In order to map linear addresses to physical addresses, if one byte is mapped to one byte, a very large table is required. This conversion relationship is very complex. Therefore, the memory space is divided into another storage unit format, usually 4 K. On Different hardware platforms, their sizes are generally different, such as x86 32-bit 4 K pages; 64-bit 4 K pages, 2 m pages, 4 m pages, 8 m pages. The default values are 4 K. Every process generally has its own page path and page table ing mechanism, regardless of which page table is loaded by the kernel. Each process can only see its own linear address space. To add new memory, you can only apply for it in its own linear address space, after the application, the kernel of the operating system must be mapped to the physical address space to find such a space, and the linear address space is ready for access, and add a ing relationship to the page table, so you can access the physical memory, which is called memory allocation. However, the new application must find such a space through the operating kernel to the physical memory, and tell the linear address space to be good, you can build a ing relationship, and finally build a ing relationship between the page table.

Reflects the general situation of the process described above. We can see that each user program has its own page table and maps it to the corresponding primary storage.


Two problems can be found based on the preceding text and chart descriptions:
1. If every process needs to search for the page table when accessing the memory, it will inevitably lead to lower performance of the server.
2. What if the memory of the primary storage is full and the application still needs to call the memory?

For the first problem, we need to use the TLB (Translation Lookaside Buffer) Translation backup Buffer. TLB is a memory management unit that can be used to improve the speed of converting virtual addresses to physical addresses. In this way, you can search for the corresponding page table data in TLB each time you search for the page table. If yes, you can directly return the data without searching for the page table, and cache the searched results in TLB. Although TLB solves the caching function, it is still slow to find ing relationships in the page table, so the hierarchical directory of the page table is available. Page table can be divided into Level 1 directory, level 2 Directory and offset

However, a process frequently opens files and closes files during running. This means frequent memory application and memory release. Some processes that can cache data in the memory allocate and recycle more memory, so each allocation will create a corresponding item in the page table. Therefore, even if the memory speed is very fast and a large number of frequent memory allocation and release at the same time, the overall performance of the server will still be reduced. Of course, when the memory space is insufficient, we call it out of memory (out of memory ). When the memory is exhausted, the entire operating system crashes. In this case, we can consider swap partitions. Swap partitions are after all the memory virtualized by the hard disk, so their performance is much worse than the real memory, therefore, do your best to avoid using swap partitions. Make sure that all physical memory is used when there is physical memory space. No matter how it is, the cpu cannot deal with swap memory. It can only deal with physical memory, and the addressable space can only be physical memory. Therefore, when the real physical memory space is insufficient, the least recently used memory will be put into the swap memory through the LRU algorithm, in this way, the space in the physical memory can be used by new programs. However, this will lead to another problem, that is, when the original process searches through the page table, the data in that space does not belong to it anymore. So at this moment, the cpu sends a notification or tells this program that the address space does not belong to it, and there may be two situations:

1. physical memory has available space: At this time, the cpu will re-Send the memory in the SWAp partition to the physical memory according to the previous conversion policy, however, the converted space address may not be the previous space address, because the previous space address may have been used by others.

2. there is no available space in the physical memory: At this time, LRU will still be used to convert the least recently used space address in the current physical address space to the swap memory, the memory in the swap space required by the current process is sent to the physical memory space, and the ing relationship is re-established.

The preceding notifications or exceptions are usually called page missing exceptions. Page missing exceptions can also be divided into two types: Large exceptions and small exceptions. A major exception is that the accessed data is not in the memory and cannot be loaded on the hard disk. Whether it is from the swap memory or directly from a file system on the disk, it needs to be loaded from the hard disk, this exception takes a long time to load. A small exception occurs when the process is accessed through shared memory. When the second process is accessed, check that the local memory ing table does not exist, but other processes already have this memory page, so you can directly map the page, this exception usually takes a short time to load.

When the operating system is started, every io Device will apply for random ports in some columns like the cpu. These ports are called I/O Ports. In the ibm pc architecture, the I/O address space provides a total of 65,536 8-bit I/O Ports. The existence of these io ports allows the cpu to perform read/write interaction with the io Device. During read/write operations, the CPU uses the address bus to select the requested I/O port, and uses the data bus to transmit data between the CPU registers and ports. The I/O port can also be mapped to the physical address space. Therefore, the communication between the processor and the I/O device can directly use the Assembly Language commands for memory operations (for example, mov, and, or, etc ). Modern hardware devices tend to map I/O, because the processing speed is faster and can be used together with DMA. In this way, the cpu does not need to pass the control of the bus to the DMA when I/O transmits data. The DMA is called once every time I/O transmits data, this frees the cpu. When the data transmission is complete, the DMA notification will interrupt the cpu once. When DMA is running, it has control permissions on the entire bus. When the cpu finds that other processes need to use the bus, the two will compete. In this case, the CPU and DMA have the same permissions in the use of control over the bus. As long as the CPU is delegated to the DMA, you cannot undo it at will, and you have to wait until the DMA is used up.

If no other process can run, or other processes run for a very short time, the CPU finds that our IO is still not completed, it means that the CPU can only wait for IO. The CPU has an iowait value in the time allocation, that is, the time the CPU spends waiting for IO. Some of them are in the synchronous call process, the CPU must wait for the completion of IO; otherwise, the CPU can release the IO transmission to automatically complete the back, and the CPU can handle other things by itself. After the hard disk data transmission is complete, the hard disk only needs to initiate a notification like the CPU. There is a device on the peripheral of the CPU, which is called a programmable interrupt controller. To communicate with the CPU, each hardware device registers a so-called interrupt number with the programmable interrupt controller when detecting the BIOS at startup. This number is used by the hardware. The current host may have multiple hardware, each of which has its own number. After the CPU receives the interrupt number, it can find the hardware device for interruption through the medium phase break scale. The corresponding I/O port is used for processing.

The CPU is running other processes. When an interrupt request is sent, the CPU immediately terminates the process being processed and processes the interruption. The current CPU suspends the process being processed and then executes the interrupted process, which is also called the interrupted switch. However, this kind of switchover is lower than process switching at the volume level, and the priority of any interruption is usually higher than that of any process, because we mean hardware interruption. The interrupt is also divided into the upper half and lower half. Generally, the upper half is when the CPU is processing, it is connected to the memory, if this problem is not urgent (the CPU or the kernel will judge it by itself), in this case, the CPU will return to the scene to continue executing the suspended process. When the process is finished, let's look back at the lower half of the execution interruption.

In a 32-bit system, in our memory (linear address) address space, in general, there is a G in the low address space for the kernel, the above three G is used by the process. But it should be clear that, in fact, in the kernel memory, it is not directly divided. 32-bit systems and 64-bit systems may be different (physical addresses). In 32-bit systems, the lowest end has so much space for DMA. The bus width of DNA is very small, and there may be only a few digits. Therefore, the addressing capability is very limited, and the access memory space is very limited. If the DMA needs to copy the data and can address the physical memory itself, you can also directly import the data into the memory, you must ensure that the DMA can address that memory. The premise of addressing is that the segment in the addressing range of M and DA is allocated to DMA. Therefore, from this perspective, our memory management is segmented.

On a 32-bit system, the 16 M memory space is allocated to ZONE_DMA (physical address space used by DMA), and ZONE_NORMAL (normal physical address space) from 16 m to 896M ), for Linux operating systems, it is the address space that can be directly accessed by the kernel. The broken space from 896M to 1 GB is called "Reserved" (Reserved physical address space ); in the physical address space from 1 GB to 4 GB, our kernel cannot be accessed directly. To access the physical address space, you must map some of the content to Reserved, in Reserved, ensure that the address encoding of the memory segment is provided before the kernel can access it. Therefore, the kernel does not directly access the physical address space larger than 1 GB. Therefore, in a 32-bit system, it needs an additional step to access the data in the memory.

In a 64-bit system, ZONE_DAM provides a low-end 1 GB address space, and the DMA addressing capability is greatly enhanced; ZONE_DAM32 can use 4 GB space; ZONE_NORMAL is defined for a space larger than 1 GB, which can be directly accessed by the kernel. Therefore, in 64-Bit mode, when the kernel accesses a memory address greater than 1 GB, no additional steps are required, and the efficiency and performance are greatly increased. That is why the 64-bit system is used. The process is described as follows:


In the current PC architecture, both AMD and INTER support a mechanism called PEA (physical address extension ). PAE. It refers to the address bus of the 32-bit system, which has four more bits, so that the address space on the 32-bit system can reach 64 GB. Of course, 32 is the system. No matter how large your physical memory is, the space used by a single process cannot be expanded. In a 32-bit system, the linear address space is only 4 GB, And the access that a single process can recognize is only 3 GB.

The linux virtual memory subsystem includes the following functional modules:

Slab allocator, zoned buddy allocator, MMU, kswapd, bdflush
Slab allocator is called a slab distributor.
Buddy allocator, also called buddy system, is called a partner system and a memory distributor.
Buddy system works on MMU, while slab allocator works on buddy system.
On the database server, the value can be set to less than or equal to 1 GB. On the database server, we should avoid using swap memory.
3. On the application server, you can set it to RAM * 0.5. Of course, this is the theoretical value.

If you do not need swap memory, you should place swap memory on the disk with the most out-of-the-head because the disk with the outermost side has the fastest access speed. Therefore, if there are multiple hard disks, you can take a small part of the outermost track of each hard disk as the swap partition. The swap partition can define the priority. Therefore, setting the swap memory priority of these hard disks to the same level can achieve load balancing. Edit/etc/fstab:
/Dev/sda1 swap pri = 5 0 0
/Dev/sdb1 swap pri = 5 0 0
/Dev/sdc1 swap pri = 5 0 0
/Dev/sdd1 swap pri = 5 0 0

4. optimization parameters for memory depletion
When the Linux memory is exhausted, it will kill the processes that occupy the most memory. In the following three cases, it will kill the processes:
1. All processes are active. At this time, if you want to switch out, there are no idle processes.
2. There is no available page in ZONE_NORMAL
3. Another new process is started. When applying for memory space, you need to find an idle memory for ing. However, you cannot find it at this time.
Once the memory is exhausted, the operating system will enable the oom-kill mechanism.
In the/proc/PID/directory, there is a file named oom_score, which is used to specify the oom score, that is, the bad guy index.

To manually enable the oom-kill mechanism, you only need to execute echo f>/proc/sysrq-trigger, which will automatically kill the process with the highest score of the specified bad guy index.
Echo n>/proc/PID/oom_adj can be used to adjust the bad guy rating index of a process. The final score index is the Npower of the oom_adj value of 2. Assume that the oom_adj value of one of our processes is 5, then its bad guy rating index is 5 to the power of 2.

To disable oom-kill, use vm. panic_on_oom = 1.

5. Memory optimization parameters related to capacity:
Overcommit_memory. There are three available parameters, specifying whether memory can be used excessively:
0: default settings, kernel execution of excessive heuristic Processing
1: The kernel executes memory-free excessive processing. Using this value will increase the possibility of memory overload.
2: The memory usage is equal to the swap size + RAM * overcommit_ratio value. This value is the safest if you want to reduce the excessive memory usage.
Overcommit_ratio:
When overcommit_memory is set to 2, the physical RAM ratio is provided. The default value is 50.

6. communication-related optimization parameters
Common means of inter-process communication in the same host:
1. communicate by message; 2. communicate by signal semaphores; 3. communicate through shared memory. The common cross-host communication mode is rpc.

Optimization Scheme for Process Communication in message mode:
Msgmax: specifies the maximum allowable size of any message in the message queue in bytes. The value must not exceed the queue size (msgmnb). The default value is 65536.
Msgmnb: specifies the maximum value (maximum length) of a single message queue in bytes ). 65536 bytes by default
Msgmni: specifies the maximum number of Message Queue Identifiers (and the maximum number of queues ). The default value of the 64-bit Architecture Machine is 1985, and the default value of the 32-bit Architecture Machine is 1736.

Optimization solution for Process Communication Using shared memory:
Shmall: Specifies the total amount of shared memory that can be used in the system at a time in bytes (the upper limit of a single application)
Shmmax: specifies the maximum size of each shared memory segment in bytes.
Shmmni: specifies the maximum shared memory segment in the system. The default value is 4096 on 64-bit and 32-bit systems.

VII. capacity-related file system tuning parameters:
File-max: Maximum number of file handles allocated by the kernel
Dirty_ratio: Specifies the percentage value. When the percentage of dirty data reaches the total system memory, pdflush is executed. The default value is 20.
Dirty_background_ratio: Specifies the percentage value. When the proportion of dirty pages occupied by a process reaches the percentage of the total system memory, pdflush is executed in the background. The default value is 10.
Dirty_expire_centisecs: pdlush is enabled every 1% seconds to refresh the dirty page. The default value is 3000, so the dirty page is refreshed every 30 seconds.
Dirty_writeback_centisecs: Refreshes a single dirty page every 1% seconds. The default value is 500. Therefore, if a dirty page exists for 5 seconds, the system begins to refresh the dirty page.

8. common observation indicator commands in linux memory:
Memory activity
Vmstat [interval] [count]
Sar-r [interval] [count]
Rate of change in memory
Sar-R [interval] [count]
Frmpg/s: The Memory Page released or allocated per second. If it is a positive number, it is the released memory page. If it is a negative number, it is the allocated memory page.
Bup-4/s: The Memory Page obtained or released in the buffer per second. If it is a positive number, it is the obtained memory page, which is a negative number. It is the released Memory Page.
Campg/s: Memory pages obtained or released in cache per second. If it is a positive number, it is the obtained memory page, which is a negative number. It is the released Memory Page.
Swap activity
Sar-W [interval] [count]
ALL IO
Sar-B [interval] [count]
Pgpgin/s: number of blocks written from disk to kernel per second
Pgpgout/s: number of blocks written from the kernel to the disk per second
Fault/s: number of page missing exceptions per second
Majflt/s: Number of large page exceptions per second
Pgfree/s: Number of retrieved pages per second


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.