Linux hints too many open files solution

Source: Internet
Author: User
Tags data structures openssh server


Yesterday, the project's Elasticsearch service hangs, I said hangs is not the process to be gone, because has the supervisor protection, but the service is not used. It's been a while before. Because the Es_heap_size improper setting causes service not to be able to use the fault, therefore I the inertia judgment should still be the es_heap_size question, but after logging in the server discovers the massive "too many open files" error message.


So how much is the maximum number of files that Elasticsearch set? You can confirm by proc:

Shell> cat/proc/<pid>/limits
The result is "4096", and we can take a closer look at what Elasticsearch is opening:

Shell> LS/PROC/<PID>/FD
The problem looks very simple, just add the corresponding configuration item should be OK. This configuration is called Max_open_files in Elasticsearch, but is not found to be valid after configuration.

In my experience, most of these problems are due to operating system limitations, but the results are all normal:

Shell> cat/etc/security/limits.conf

* Soft Nofile 65535
* Hard Nofile 65535

The problem came to a dead end, so I began to try to find some Chine to see if I can ease up first, I search to @-Immortal-an article: dynamically modify the running process of Rlimit, which describes how to dynamically modify the threshold method, although I test all show success, unfortunately Elasticsearch is still not working properly:

Shell> echo-n ' Max open files=65535:65535 ' >/proc/<pid>/limits

In addition, I also checked the system kernel parameters Fs.file-nr and Fs.file-max, in short, everything and file related parameters are checked, even in the startup script hard-coded "ulimit-n 65535", but all efforts are meaningless.

Just at the end of the road, colleagues @ Xuan Pulse Blade nail Mystery: Close supervisor Process management mechanism, use manual way to start Elasticsearch process try. As a result, everything returned to normal.

Why is that? Because the supervisor process management mechanism is used, it FORK out the subprocess as a parent process, that is, the elasticsearch process, in view of the parent-child relationship, the child process allows the maximum number of files to open beyond the parent process's threshold limit, but supervisor Minfds The maximum number of files allowed to open by default is too small, causing the elasticsearch process to fail.

The reason for this failure was very simple, but I was caught in the empirical fixed thinking, worthy of reflection.

Add a permanent solution:
1. Modify/etc/security/limits.conf
Add the following


Shell
$user Hard Nofile 131072

$user is the user that is used to start WLS. 2048 is the recommended value and may need to be increased again if the same problem is encountered.

* Represents all Users:


Shell
* Soft Nofile 131072
* Hard Nofile 131072

Refer to the recommended settings for Oracle Enterprise Linux:


Shell
Oracle Hard Nofile 131072
Oracle Soft Nofile 131072
Oracle Hard Nproc 131072
Oracle Soft Nproc 131072
Oracle Soft Core Unlimited
Oracle Hard Core Unlimited
Oracle Soft Memlock 3500000
Oracle Hard Memlock 3500000
# Recommended stack hard limit 32MB for Oracle installations
# Oracle Hard Stack 32768

2. Other solutions from the official documents of the Debian Gnu/linux and Oracle Technology network directly modify kernel parameters without restarting the system.


Shell
Sysctl-w Fs.file-max 65536

# Apply On-the-fly to Proc
echo "65536" >/proc/sys/fs/file-max
# OR
echo 65536 | sudo tee/proc/sys/fs/file-max

The effect is the same, the former changes the kernel parameters, the latter directly to the kernel parameters in the virtual filesystem (PROCFS, Psuedo file system) on the corresponding files.
You can view the new restrictions with the following command


Shell
sysctl-a | grep Fs.file-max

# or use proc
Cat/proc/sys/fs/file-max

modifying kernel parameters


/etc/sysctl.conf Shell
echo "fs.file-max=65536" >>/etc/sysctl.conf
Sysctl-p

To view the current file handles usage:


Shell
sysctl-a | grep FS.FILE-NR

# OR
Cat/proc/sys/fs/file-nr
825 0 65536

Output format: The number of allocated file handles, the number of free file handles, and the maximum number of file handles.

Another command:


Shell
lsof | Wc-l

A little confusing to me is that the results of these two commands are always different;-( The reasons are as follows:
In short, FILE-NR gives you the file descriptors (the document descriptor, the data structure, the handle that the program uses to open the file), and Lsof lists the open files (files), including not the file descriptor. For example: current directory, mapped to in-memory library files and executable text files (scripts?) )。 Usually lsof output is larger than FILE-NR.

For a simple example: the number of files that Firefox opens in the current system:


Shell
Lsof-p PID | Wc-l

# or
lsof | grep pid | Wc-l

Look at this process. The number of file descriptors used by the PID


Shell
ls/proc/pid/fd | Wc-l

In contrast, I understand, note: The library file loaded into memory can be seen in details/proc/pid/maps

In addition, the use of sysctl to modify the kernel parameters Fs.file-max and the difference between Ulimit, spent a lot of time research, learned Linux/freebsd/solaris/opensolaris veteran jockey classmate, After getting the inspiration finally basically understand the concept and the difference.

Priority (Open File descriptors):
Soft Limit < hard limit < Kernel (Nr_open =>/proc/sys/fs/nr_open) < constraints caused by the data structures used for the maximum number of file descriptor

The Linux kernel provides the getrlimit and Setrlimit system calls to get and set resource limits per process. Each resource has a associated soft and hard limit. The soft limit is the value of the kernel enforces for the corresponding resource. The hard limit acts as a ceiling for the soft Limit:an unprivileged process could only set it soft limit to a value in the range from 0 The hard limit, and (irreversibly) lower its hard limit. A Privileged process (one with the cap_sys_resource capability) could make arbitrary changes to either limit value.

As a test environment, especially in the form of VMWare guest OS, installing OpenSSH Server, Webmin, Phpsysinfo and other tools can improve efficiency.

A quick fix for Oracle Enterprise Linux and Red Hat Enterprise Linux:
In addition, Oel 5 and RHEL 5 can install oracle-validated packages directly to address the package dependencies and system configuration issues required to install Oracle databases and middleware, recommended!


Yum Install oracle-validated

or manually install after downloading

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.