This is a creation in Article, where the information may have evolved or changed.
1. Phenomena
The CPU of the service is running full (Golang implementation), and a large number of too many open files errors are reported. The service is run using SYSTEMD and is deployed on Alibaba ECS.
2. Analysis
From the log, the rise of the CPU is mainly caused by the limit of the number of files reached, but the number of files that have changed the system and the number of files of all users, according to the truth is that this problem should not occur, and later access to data found that the number of files can be limited from three dimensions for operating system restrictions, user state restrictions, and For these three dimensions, select the minimum value to take effect. The system is then analyzed.
Start by looking at the number of files currently open, and the process occupies a few files.
Lsof-n|awk ' {print $} ' |sort|uniq-c|sort-nr|more
Then get the system-level file number limit
Input command
Cat/etc/sysctl.conf
Get
Fs.file-max = 1000000
Query user-level file limit
Cat/etc/security/limits.conf
Get
* Soft Nofile 655350
* Hard Nofile 655350
Individual Get program file limit (9928 is process ID)
Get
Max open files 1024x768 4096 files
As can be found, although the system and the number of users of the file, but the program is still a very low value, where the number of files in the process is usually inherited by default user-level values, and here is not inherited, initially suspected to be systemd startup problem, but handwritten another test service, Discover that the service also inherits the number of user files.
The initial value of the number of files is added to the Systemd startup script in the case of a baffled solution.
As follows:
[Service]
Type=simple
limitnofile=40960
limitnproc=40960
Get the program separately (9928 is the process ID)
Get
Max open files 40960 40960 files
The number of found files is set to the initialization value at startup. As for why not inherit the user-level value, suspect is the program has done the parameter settings, here if someone knows the specific situation in Golang, but also hope to enlighten.
3. Summary
The troubleshooting steps for file descriptor errors are as follows:
First of all, to determine whether the configuration parameters are correct, here is related to the above mentioned three dimensions of the check, especially when the process dimension, if only ulimit-n a bit to finish, it is estimated that the old to be like me into the pit.
If the parameters are correct, then look at how many files the current system is using, if you use a lot, it depends on where to use, there are generally two cases, a large number of connections are not closed, or a large number of read file handle is not closed. The specific reason is that you can check it out here.