Linux memory basic knowledge and related tuning solutions, linux Tuning

Source: Internet
Author: User

Linux memory basic knowledge and related tuning solutions, linux Tuning
Memory is one of the important components in the computer and serves as a bridge to communicate with the CPU. All programs in the computer run in the memory, so the memory performance has a great impact on the computer. Memory is used to temporarily store computing data in the CPU and data exchanged with external memory such as the hard disk. As long as the computer is running, the CPU transfers the data to be computed to the memory for computation. After the computation is completed, the CPU sends the result, the memory operation also determines the computer's stable operation. Memory may be the most troublesome device for the entire operating system. However, its performance directly affects the entire operating system.

We know that the CPU cannot deal with the hard disk. It can only be called by the CPU when data is loaded into the memory. When the cpu accesses the memory, it must first be like a memory monitoring program request. The monitoring program controls and allocates memory read/write requests. This monitoring program is called MMU (Memory Management Unit ). The following uses a 32-bit system to describe the memory access process:

In a 32-bit system, when each process accesses the memory, each process is considered as having 4 GB of memory available. This is called virtual memory (address ), virtual Memory is converted to physical memory through MMU. In order to convert a linear address to a physical address, the page table memory space is required, and the page table is loaded to MMU. In order to map linear addresses to physical addresses, if one byte is mapped to one byte, a very large table is required. This conversion relationship is very complex. Therefore, the memory space is divided into another storage unit format, usually 4 K. On Different hardware platforms, their sizes are generally different, such as x86 32-bit 4 K pages; 64-bit 4 K pages, 2 m pages, 4 m pages, 8 m pages. The default values are 4 K. Every process generally has its own page path and page table ing mechanism, regardless of which page table is loaded by the kernel. Each process can only see its own linear address space. To add new memory, you can only apply for it in its own linear address space, after the application, the kernel of the operating system must be mapped to the physical address space to find such a space, and the linear address space is ready for access, and add a ing relationship to the page table, so you can access the physical memory, which is called memory allocation. However, the new application must find such a space through the operating kernel to the physical memory, and tell the linear address space to be good, you can build a ing relationship, and finally build a ing relationship between the page table.

Reflects the general situation of the process described above. We can see that each user program has its own page table and maps it to the corresponding primary storage.

Two problems can be found based on the preceding text and chart descriptions:
1. If every process needs to search for the page table when accessing the memory, it will inevitably lead to lower performance of the server.
2. What if the memory of the primary storage is full and the application still needs to call the memory?

For the first problem, we need to use the TLB (Translation Lookaside Buffer) Translation backup Buffer. TLB is a memory management unit that can be used to improve the speed of converting virtual addresses to physical addresses. In this way, you can search for the corresponding page table data in TLB each time you search for the page table. If yes, you can directly return the data without searching for the page table, and cache the searched results in TLB. Although TLB solves the caching function, it is still slow to find ing relationships in the page table, so the hierarchical directory of the page table is available. Page table can be divided into Level 1 directory, level 2 Directory and offset

However, a process frequently opens files and closes files during running. This means frequent memory application and memory release. Some processes that can cache data in the memory allocate and recycle more memory, so each allocation will create a corresponding item in the page table. Therefore, even if the memory speed is very fast and a large number of frequent memory allocation and release at the same time, the overall performance of the server will still be reduced. Of course, when the memory space is insufficient, we call it out of memory (out of memory ). When the memory is exhausted, the entire operating system crashes. In this case, we can consider swap partitions. Swap partitions are after all the memory virtualized by the hard disk, so their performance is much worse than the real memory, therefore, do your best to avoid using swap partitions. Make sure that all physical memory is used when there is physical memory space. No matter how it is, the cpu cannot deal with swap memory. It can only deal with physical memory, and the addressable space can only be physical memory. Therefore, when the real physical memory space is insufficient, the least recently used memory will be put into the swap memory through the LRU algorithm, in this way, the space in the physical memory can be used by new programs. However, this will lead to another problem, that is, when the original process searches through the page table, the data in that space does not belong to it anymore. So at this moment, the cpu sends a notification or tells this program that the address space does not belong to it, and there may be two situations:

1. physical memory has available space: At this time, the cpu will re-Send the memory in the SWAp partition to the physical memory according to the previous conversion policy, however, the converted space address may not be the previous space address, because the previous space address may have been used by others.

2. there is no available space in the physical memory: At this time, LRU will still be used to convert the least recently used space address in the current physical address space to the swap memory, the memory in the swap space required by the current process is sent to the physical memory space, and the ing relationship is re-established.

The preceding notifications or exceptions are usually called page missing exceptions. Page missing exceptions can also be divided into two types: Large exceptions and small exceptions. A major exception is that the accessed data is not in the memory and cannot be loaded on the hard disk. Whether it is from the swap memory or directly from a file system on the disk, it needs to be loaded from the hard disk, this exception takes a long time to load. A small exception occurs when the process is accessed through shared memory. When the second process is accessed, check that the local memory ing table does not exist, but other processes already have this memory page, so you can directly map the page, this exception usually takes a short time to load.

When the operating system is started, every io Device will apply for random ports in some columns like the cpu. These ports are called I/O Ports. In the ibm pc architecture, the I/O address space provides a total of 65,536 8-bit I/O Ports. The existence of these io ports allows the cpu to perform read/write interaction with the io Device. During read/write operations, the CPU uses the address bus to select the requested I/O port, and uses the data bus to transmit data between the CPU registers and ports. The I/O port can also be mapped to the physical address space. Therefore, the communication between the processor and the I/O device can directly use the Assembly Language commands for memory operations (for example, mov, and, or, etc ). Modern hardware devices tend to map I/O, because the processing speed is faster and can be used together with DMA. In this way, the cpu does not need to pass the control of the bus to the DMA when I/O transmits data. The DMA is called once every time I/O transmits data, this frees the cpu. When the data transmission is complete, the DMA notification will interrupt the cpu once. When DMA is running, it has control permissions on the entire bus. When the cpu finds that other processes need to use the bus, the two will compete. In this case, the CPU and DMA have the same permissions in the use of control over the bus. As long as the CPU is delegated to the DMA, you cannot undo it at will, and you have to wait until the DMA is used up.

If no other process can run, or other processes run for a very short time, the CPU finds that our IO is still not completed, it means that the CPU can only wait for IO. The CPU has an iowait value in the time allocation, that is, the time the CPU spends waiting for IO. Some of them are in the synchronous call process, the CPU must wait for the completion of IO; otherwise, the CPU can release the IO transmission to automatically complete the back, and the CPU can handle other things by itself. After the hard disk data transmission is complete, the hard disk only needs to initiate a notification like the CPU. There is a device on the peripheral of the CPU, which is called a programmable interrupt controller. To communicate with the CPU, each hardware device registers a so-called interrupt number with the programmable interrupt controller when detecting the BIOS at startup. This number is used by the hardware. The current host may have multiple hardware, each of which has its own number. After the CPU receives the interrupt number, it can find the hardware device for interruption through the medium phase break scale. The corresponding I/O port is used for processing.

The CPU is running other processes. When an interrupt request is sent, the CPU immediately terminates the process being processed and processes the interruption. The current CPU suspends the process being processed and then executes the interrupted process, which is also called the interrupted switch. However, this kind of switchover is lower than process switching at the volume level, and the priority of any interruption is usually higher than that of any process, because we mean hardware interruption. The interrupt is also divided into the upper half and lower half. Generally, the upper half is when the CPU is processing, it is connected to the memory, if this problem is not urgent (the CPU or the kernel will judge it by itself), in this case, the CPU will return to the scene to continue executing the suspended process. When the process is finished, let's look back at the lower half of the execution interruption.

In a 32-bit system, in our memory (linear address) address space, in general, there is a G in the low address space for the kernel, the above three G is used by the process. But it should be clear that, in fact, in the kernel memory, it is not directly divided. 32-bit systems and 64-bit systems may be different (physical addresses). In 32-bit systems, the lowest end has so much space for DMA. The bus width of DNA is very small, and there may be only a few digits. Therefore, the addressing capability is very limited, and the access memory space is very limited. If the DMA needs to copy the data and can address the physical memory itself, you can also directly import the data into the memory, you must ensure that the DMA can address that memory. The premise of addressing is that the segment in the addressing range of M and DA is allocated to DMA. Therefore, from this perspective, our memory management is segmented.

On a 32-bit system, the 16 M memory space is allocated to ZONE_DMA (physical address space used by DMA), and ZONE_NORMAL (normal physical address space) from 16 m to 896M ), for Linux operating systems, it is the address space that can be directly accessed by the kernel. The broken space from 896M to 1 GB is called "Reserved" (Reserved physical address space ); in the physical address space from 1 GB to 4 GB, our kernel cannot be accessed directly. To access the physical address space, you must map some of the content to Reserved, in Reserved, ensure that the address encoding of the memory segment is provided before the kernel can access it. Therefore, the kernel does not directly access the physical address space larger than 1 GB. Therefore, in a 32-bit system, it needs an additional step to access the data in the memory.

In a 64-bit system, ZONE_DAM provides a low-end 1 GB address space, and the DMA addressing capability is greatly enhanced; ZONE_DAM32 can use 4 GB space; ZONE_NORMAL is defined for a space larger than 1 GB, which can be directly accessed by the kernel. Therefore, in 64-Bit mode, when the kernel accesses a memory address greater than 1 GB, no additional steps are required, and the efficiency and performance are greatly increased. That is why the 64-bit system is used. The process is described as follows:

In the current PC architecture, both AMD and INTER support a mechanism called PEA (physical address extension ). PAE. It refers to the address bus of the 32-bit system, which has four more bits, so that the address space on the 32-bit system can reach 64 GB. Of course, 32 is the system. No matter how large your physical memory is, the space used by a single process cannot be expanded. In a 32-bit system, the linear address space is only 4 GB, And the access that a single process can recognize is only 3 GB.

The linux virtual memory subsystem includes the following functional modules:

Slab allocator, zoned buddy allocator, MMU, kswapd, bdflush
Slab allocator is called a slab distributor.
Buddy allocator, also called buddy system, is called a partner system and a memory distributor.
Buddy system works on MMU, while slab allocator works on buddy system.

Frequent memory allocation and recovery in the system will inevitably cause memory fragmentation. To avoid memory fragmentation as much as possible, when implementing memory allocation by buddy system, in advance, it can divide the memory into a variety of different units like a large or small image. When allocating, try to find the most suitable memory space for outward allocation. If the memory space of a process is released, it can separate multiple discrete memories, the adjacent small memory space is merged into a larger continuous area. Therefore, when allocating, try to find the most suitable gap for allocation, and finally combine the address space released by multiple processes into a continuous space without paging. Some of the memory space required by the process is not allowed to be paged. Therefore, if there is no large segment of continuous memory space, this information cannot be stored. Therefore, when there is an allocation in the memory, try to find the best out-of-memory allocation space as much as possible, and the memory can be merged into a continuous large memory space during recovery. This is the meaning of buddy system, used to avoid out-of-memory fragments. External fragments mean that many pages in the memory are not consecutive. Buddy allocator is only used to allocate applications for pages or non-page contiguous spaces. Generally, this space level is relatively large, but sometimes we need a relatively small space, such as opening a file, when accessing INODE. In this case, it may be placed in a page, but only the INODE is stored in this page, which will cause a great waste of space. Therefore, although data may also be stored on a page, the page certainly does not only store this data, but may store multiple data. So how can we store it quickly? Each INODE is a special data structure with various information. Therefore, it is called a special data structure. When we store data, we need to not only store data, but also store its structure, which is smaller than the page. To achieve fast storage, This is the significance of slab allocator. It can apply for several pages by itself, and divide these pages into a unique internal data structure suitable for storing certain objects. The structure is saved. When INODE needs to be stored, you can enter the INODE information. The structure is allocated in advance. When a file is closed, its INODE will be cleared. Slab allocator can also take back the INODE it clears for other processes to continue to use when opening files. This is to avoid memory fragmentation and complete the distribution of small pieces of memory.

When we implement the buddy system to allocate space in the memory, if the physical memory space is insufficient, swap memory may be used. Kswapd implements swap out and swap in. Put the data into the swap and load the data back to the memory from the swap. Of course, it should be clear that if a process modifies the data, the data will eventually be filled into the disk from the memory. Because the data accessed by the process is completed in the memory, the data must be written to the attached storage to complete permanent data storage. Pdflush is used to perform this write operation. As soon as we write data into the memory, it is not immediately written into the disk, because this performance is too bad, and this process is asynchronous, not synchronous. The kernel regularly synchronizes the stored data to the disk. Pdflush is a kernel thread and usually has one hard disk. It monitors the data (usually called dirty pages) in the current memory space that has been modified, data not synchronized to the disk is stored in the disk. Of course, it does not have to take the initiative to monitor, if the dirty pages in the physical memory has reached a percentage, it will also take the initiative to synchronize data.

The modified data in the physical memory cannot be exchanged and must be written to the disk because the swap may cause some faults. Therefore, we can exchange data in the SWAp memory, which must be the data that has not been modified. The modified data can only be written to the hard disk if it needs to be released.

With the basic knowledge of memory, we can look at some common optimization solutions for memory:
1. optimization parameters related to hugepage
Cat/proc/zoneinfo can be used to view the division of memory segments in the current operating system

HugePage: large page
The centos64-bit system not only supports large pages, but also supports transparent large pages (THP, transparent huge page)
Transparent large pages are used for anonymous memory segments. It can automatically manage anonymous memory segments by using a large page behind the operating system without the participation of any user. What if the memory segment is an anonymous memory segment? RSS minus shared memory is an anonymous memory segment. The transparent large page supports two sizes for centos 64-bit systems: 2 MB and 1 GB. 1 GB is usually effective in the use of terabytes of memory. In dozens of GB and hundreds of GB memory, 2 m is usually a good choice. Generally, the transparent large page mechanism is enabled only when the physical memory is larger than 4 GB. Transparent large pages are usually used quietly behind the system. Why is it transparent? It is because users do not need to participate.

The nr_anon_transparent_hugepages parameter in/proc/zoneinfo shows the usage of transparent large pages. Generally, a page is 2 MB.
AnonHugePages: the total size of the transparent page is displayed.
Hugepagesize: the size of a large page.
HugePages_Total: Total number of large pages
HugePages_Free: Total number of remaining large pages
HugePages_Rsvd: Total number of reserved large pages
HugePages_Total is specified by the user rather than transparent large pages.

You can use vm. nr_hugepages = n to manually specify the number of large pages. In general, we can manually define them as shared memory space for mounting, and directly mount them as partitions in the memory to the directory of a file system. The usage is as follows:
Mount-t hugetlbfs none/hugepages
Then we can use hugepages as a memory disk. Of course, we usually do not need to specify it by ourselves. For example, for a mysql server, there is a variable on the mysql server. we need to define the variable with a large transparent page, it will be used automatically.

Ii. optimization parameters related to buffer and cache
/Proc/sys/vm/drop_caches can manually force the release of buffer and cache. It accepts three parameters.
If the value is 1, all pagecache page caches will be released.
2: Release the dentries and inode cache.
3: releases the cache of pagecache, dentries, and inode.

Buffer and cache are classified into two categories:
1. pagecache: Used to cache page data. Usually, it caches file data and opened file content.
2. buffers: cached file metadata (inode and dentries), and sometimes used to cache write requests

Echo 1>/proc/sys/vm/drop_caches releases the first one above.
Echo 2>/proc/sys/vm/drop_caches releases the second one above.
Echo 3>/proc/sys/vm/drop_caches releases the sum of the above two

Iii. optimization parameters related to swap memory
/Proc/sys/vm/swappiness indicates the kernel's tendency to use swap memory.
The larger the value, the more likely it is to use swap memory. The smaller the value, the less inclined it is to use swap memory (but this does not mean that swap memory cannot be used). The default value is 60 and the value range is 0-100. We recommend that you set this value to a smaller value on the server, or even 0. Generally, when we have mapped the memory percentage to the page table (that is, the percentage in our physical memory is used by the page table) + vm. when the value of swappiness is greater than or equal to 100, swap memory is enabled.

Recommended memory swap:
1. on the server that executes batch processing computing (Scientific Computing), you can set it to a relatively large value.
2. on the database server, the value can be set to less than or equal to 1 GB. On the database server, we should try our best to avoid using swap memory.
3. On the application server, you can set it to RAM * 0.5. Of course, this is the theoretical value.

If you do not need swap memory, you should place swap memory on the disk with the most out-of-the-head because the disk with the outermost side has the fastest access speed. Therefore, if there are multiple hard disks, you can take a small part of the outermost track of each hard disk as the swap partition. The swap partition can define the priority. Therefore, setting the swap memory priority of these hard disks to the same level can achieve load balancing. Edit/etc/fstab:
/Dev/sda1 swap pri = 5 0 0
/Dev/sdb1 swap pri = 5 0 0
/Dev/sdc1 swap pri = 5 0 0
/Dev/sdd1 swap pri = 5 0 0

4. optimization parameters for memory depletion
When the Linux memory is exhausted, it will kill the processes that occupy the most memory. In the following three cases, it will kill the processes:
1. All processes are active. At this time, if you want to switch out, there are no idle processes.
2. There is no available page in ZONE_NORMAL
3. Another new process is started. When applying for memory space, you need to find an idle memory for ing. However, you cannot find it at this time.
Once the memory is exhausted, the operating system will enable the oom-kill mechanism.
In the/proc/PID/directory, there is a file named oom_score, which is used to specify the oom score, that is, the bad guy index.

To manually enable the oom-kill mechanism, you only need to execute echo f>/proc/sysrq-trigger, which will automatically kill the process with the highest score of the specified bad guy index.
Echo n>/proc/PID/oom_adj can be used to adjust the bad guy rating index of a process. The final score index is the Npower of the oom_adj value of 2. Assume that the oom_adj value of one of our processes is 5, then its bad guy rating index is 5 to the power of 2.

To disable oom-kill, use vm. panic_on_oom = 1.

5. Memory optimization parameters related to capacity:
Overcommit_memory. There are three available parameters, specifying whether memory can be used excessively:
0: default settings, kernel execution of excessive heuristic Processing
1: The kernel executes memory-free excessive processing. Using this value will increase the possibility of memory overload.
2: The memory usage is equal to the swap size + RAM * overcommit_ratio value. This value is the safest if you want to reduce the excessive memory usage.
When overcommit_memory is set to 2, the physical RAM ratio is provided. The default value is 50.

6. communication-related optimization parameters
Common means of inter-process communication in the same host:
1. communicate by message; 2. communicate by signal semaphores; 3. communicate through shared memory. The common cross-host communication mode is rpc.

Optimization Scheme for Process Communication in message mode:
Msgmax: specifies the maximum allowable size of any message in the message queue in bytes. The value must not exceed the queue size (msgmnb). The default value is 65536.
Msgmnb: specifies the maximum value (maximum length) of a single message queue in bytes ). 65536 bytes by default
Msgmni: specifies the maximum number of Message Queue Identifiers (and the maximum number of queues ). The default value of the 64-bit Architecture Machine is 1985, and the default value of the 32-bit Architecture Machine is 1736.

Optimization solution for Process Communication Using shared memory:
Shmall: Specifies the total amount of shared memory that can be used in the system at a time in bytes (the upper limit of a single application)
Shmmax: specifies the maximum size of each shared memory segment in bytes.
Shmmni: specifies the maximum shared memory segment in the system. The default value is 4096 on 64-bit and 32-bit systems.

VII. capacity-related file system tuning parameters:
File-max: Maximum number of file handles allocated by the kernel
Dirty_ratio: Specifies the percentage value. When the percentage of dirty data reaches the total system memory, pdflush is executed. The default value is 20.
Dirty_background_ratio: Specifies the percentage value. When the proportion of dirty pages occupied by a process reaches the percentage of the total system memory, pdflush is executed in the background. The default value is 10.
Dirty_expire_centisecs: pdlush is enabled every 1% seconds to refresh the dirty page. The default value is 3000, so the dirty page is refreshed every 30 seconds.
Dirty_writeback_centisecs: Refreshes a single dirty page every 1% seconds. The default value is 500. Therefore, if a dirty page exists for 5 seconds, the system begins to refresh the dirty page.

8. common observation indicator commands in linux memory:
Memory activity
Vmstat [interval] [count]
Sar-r [interval] [count]
Rate of change in memory
Sar-R [interval] [count]
Frmpg/s: The Memory Page released or allocated per second. If it is a positive number, it is the released memory page. If it is a negative number, it is the allocated memory page.
Bup-4/s: The Memory Page obtained or released in the buffer per second. If it is a positive number, it is the obtained memory page, which is a negative number. It is the released Memory Page.
Campg/s: Memory pages obtained or released in cache per second. If it is a positive number, it is the obtained memory page, which is a negative number. It is the released Memory Page.
Swap activity
Sar-W [interval] [count]
Sar-B [interval] [count]
Pgpgin/s: number of blocks written from disk to kernel per second
Pgpgout/s: number of blocks written from the kernel to the disk per second
Fault/s: number of page missing exceptions per second
Majflt/s: Number of large page exceptions per second
Pgfree/s: Number of retrieved pages per second

I recommend a book on linux clusters.

Step-by-Step Basic Linux knowledge, server setup, system management, performance tuning, cluster applications
Part 1 Basic Knowledge
Chapter 2 Linux learning methodology
Chapter 2 installation and basic configuration of Linux
Chapter 2 system basic structure
Chapter 2 common Linux commands and their usage
Chapter 2 installation and management of software packages in Linux
Part 2 server Construction
Chapter 2 Linux server network configuration
Chapter 2 Linux Server Security Policy
Chapter 4 set up a Linux Server
Chapter 4 server troubleshooting
Part 3 System Management
Chapter 2 Linux user permission management
Chapter 2 Linux disk storage management
Chapter 2 Linux File System Management
Chapter 2 Linux memory management
Chapter 2 Linux Process Management
Part 4 Performance Tuning
Chapter 2 Linux system optimization ideas
Chapter 2 Linux System Performance Evaluation and Optimization
Part 5 cluster and high availability
Chapter 2 Introduction to Linux cluster technology
Chapter 2 Linux-HA open source software Heartbeat
Chapter 2 Linux storage Cluster
Chapter 2 Linux Server Load balancer software LVS

Application of Linux cluster library P cluster technology MC-ServiceGuard
Design and Implementation of parallel program debugger for Linux cluster system
Implementation Analysis of LVS in Linux
Linux Server Load balancer cluster LVS implementation analysis and testing
Load Balancing principles and Algorithms in Distributed Network Environments
Linux-based PC Cluster
Implementation of Beowulf Cluster Based on Linux
Linux-based server cluster Solution
Linux-based cluster management system design and implementation
Linux-based Process Migration Mechanism Design
Server Load balancer Based on Linux Virtual Server
GSAD Algorithm Based on Cluster System
Message-oriented middleware cluster technology
A new Linux-based email server cluster system
Network Load Balancing Scheduling System with TCP connection Fault Tolerance

Introduction to the LINUX cluster architecture ebook
Chapter 2 Linux Cluster Computer Basics
Chapter 2 multi-processor architecture
Chapter 4 inter-process communication
Chapter 4 assemble the cluster hardware
Chapter 4 configure related operating system files
Chapter 4 configure the software development user environment
Chapter 2 architecture of master-slave interface software
Chapter 2 external performance measurement and analysis
Chapter 2 internal performance measurement and timing
Chapter 4 robust Software
Chapter 2 in-depth research
Chapter 4 Conclusion
Appendix source code

Linux operating system cluster principle and practical experience OS

How can I tune Tomcat 7? The details are not about Tomcat 5 or 6.

I could have done it, but I cannot call it.
Enable the Servlet reload function. (After modifying the servlet, tomcat does not recognize it immediately. When this function is enabled, the servlet program will re-load the memory, so that it can be recognized immediately after modification)
The method is as follows: Find the conf folder in the tomcat installation directory, open the context. xml file, open it in notepad, and find the <Context> tag
Change to <Context reloadable = "true">
That's it.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.