Case study: an Apache server, due to its too large set of maxclients parameters and the sudden surge in access traffic, results in memory consumption, which leads to swap and load increase, leading to downtime.
The so-called: swap, the performance of the event, the place of death, the survival of the road, can not be ignored.
Which tools can monitor swap?
The free command indicates the current usage of swap:
shell> free -m total used freeSwap: 34175 11374 22801
Another commonly used is the sar command, which can list the system's swap usage at various times:
shell> sar -rkbswpfree kbswpused %swpused kbswpcad 23345644 11650572 33.29 4656908 23346452 11649764 33.29 4656216 23346556 11649660 33.29 4650308 23346932 11649284 33.29 4649888 23346992 11649224 33.29 4648848
However, the free command and the sar command do not display real-time data. If necessary, you can use the vmstat command:
shell> vmstat 1-----------memory------------- ---swap-- swpd free buff cache si so11647532 123664 305064 7193168 0 011647532 123672 305064 7193172 0 011647532 125728 305064 7193468 0 011647532 125376 305064 7193476 0 011647532 124508 305068 7193624 0 0
Refresh the result once per second. The related data is listed in the SWAp column. The explanations of Si and so are as follows:
- Si: amount of memory swapped in from disk (/s ).
- So: amount of memory swapped to disk (/s ).
If they have always been zero, it would be better not to be zero, and occasionally it will not be zero. The bad thing is that they will never be zero.
The methods described above show the overall situation of swap. But if I want to see which processes have used swap, what should I do? This problem is a bit tricky. Let's take a look at it:
The good news is that the top command can provide this information, but it is not displayed by default. We need to activate it:
- Open top;
- Press F to go to the field selection page;
- Press P to select the swap field;
- Press enter to confirm.
The bad message is that the swap information provided by the top command is just a theoretical value, or, more bluntly, it is simply untrusted (in top, the swap formula is: swap = VIRT-RES ).
BTW: In contrast, the "nflt" field in top is more valuable. It indicates the number of pagefault times.
Can we obtain the process's swap status? Don't worry. Check the Code:
#!/bin/bashcd /procfor pid in [0-9]*; do command=$(cat /proc/$pid/cmdline) swap=$( awk ' BEGIN { total = 0 } /Swap/ { total += $2 } END { print total } ' /proc/$pid/smaps ) if (( $swap > 0 )); then if [[ "${head}" != "yes" ]]; then echo -e "PID\tSWAP\tCOMMAND" head="yes" fi echo -e "${pid}\t${swap}\t${command}" fidone
Note: Run this script with the root permission.
Which factors may affect swap?
Insufficient memory will undoubtedly cause swap, but sometimes, even if it seems that there is ample memory, swap may also occur. This phenomenon is called swap insanity and the main culprit is as follows:
Swappiness lost
In fact, when the available memory is insufficient, the system has two options: one is to release the memory through swap, and the other is to delete the page in the cache to release the memory. A common example is that swap often occurs when copying large files. This is because when the file is copied, the system caches the file content in the cache by page. Once the available memory is insufficient, the system will tend to release the memory through swap.
The swappiness parameter in the kernel can be used to control this behavior. By default, the value of swappiness is 60:
shell> sysctl -a | grep swappinessvm.swappiness = 60
It means that if the system requires memory, there is a 60% probability to execute swap. With this in mind, we naturally think of the following method to reduce the probability of swap execution:
shell> echo "vm.swappiness = 0" >> /etc/sysctl.confshell> sysctl -p
This can indeed reduce the probability of swap execution, but it does not mean that swap will never be executed.
NUMA curse
NUMA has a lot of discussions in the MySQL Community. I will not talk about it here. It is a matter of resentment between NUMA and swap.
About the numactl command at the core of NUMA:
shell> numactl --hardwareavailable: 2 nodes (0-1)node 0 size: 16131 MBnode 0 free: 100 MBnode 1 size: 16160 MBnode 1 free: 10 MBnode distances:node 0 1 0: 10 20 1: 20 10
We can see that the system has two nodes (actually two physical CPUs), each of which is divided into 16 GB memory, with MB memory left on node 0 and 10 MB memory left on node 1. Assume that a process requiring 11 MB of memory is started and the system assigns it to node 1 for execution. Although the total available memory of the system is larger than the memory required by the process, however, node 1 may still trigger swap because the remaining available memory is insufficient.
One thing to note is that the remaining memory of each node in the numactl command does not include the cache memory. If you need to know it, we can use the drop_caches parameter to release it first:
shell> sysctl vm.drop_caches=1
Note: this operation may cause system load fluctuations.
In addition, how do I determine the node and memory allocation of a process? There are ready-made scripts on the network.
To avoid the impact of NUMA on swap, the simplest way is to disable it when starting the process:
shell> numactl --interleave=all ...
In addition, the Kernel Parameter zone_reclaim_mode is also very important. If the available memory of a node is insufficient, if it is 0, the system will tend to allocate memory from the remote node; if it is 1, the system will tend to reclaim the cache memory from the local node. Most of the time, cache is very important to performance, so 0 is a better choice.
shell> echo "vm.zone_reclaim_mode = 0" >> /etc/sysctl.confshell> sysctl -p
In addition, there are some discussions on MySQL and swap on the network, which has a certain significance for understanding swap. We recommend that you:
- How to avoid using swap in MySQL (1)
- How to avoid using swap in MySQL (2)
- How to avoid using swap in MySQL (3)
Note: If the K option is included during memcached startup, swap can be avoided, but use it with caution.
...
In the past few years, YouTube had been plagued by the swap problem. Their solutions at that time were extreme: deleting swap! I have to say that this is really a brave man. Unfortunately, it is too dangerous for us, because once the memory is exhausted, because there is no swap buffer, the system will immediately start Oom, and the results may make the problem more complicated. Therefore, we should be honest.
This entry was posted in
Technical and tagged Linux, swap by Lao Wang. Bookmark thepermalink.