Too program open files solution and principle, tooworkflow
The following is my understanding and summary of the knowledge I learned when Too program open files is abnormal. If any of them is incorrect, please point them out!
You can try the following keywords: "file descriptor leak" "stackoverflow" "how to solve open files exception.
Here are some of my summary, which may be helpful to you!
1. fd
Fd is short for file descriptor
In a linux environment, everything exists as a file. Through a file, you can not only access common data, but also access network connections and hardware,
The application identifies the file/device/service through fd ..
You may need to learn more about fd definitions and functions.
2. lsof
Command in linux, full name: list system open files
Section 1st states that all linux resources are in the form of files. Therefore, this lsof command is an effective tool for us to check the usage of system resources. Check for various resource depletion in linux. An exception occurs.
And other issues can be used. Almost all Chinese users can find such tokens:
Enter lsof in the terminal to display the files opened by the system. Because lsof needs to access the core memory and various files, it must be run as the root user to be sufficient.
. Directly input the lsof part of the output:
Command pid user fd type device size/OFF NODE NAME
Init 1 root cwd DIR 8, 1 4096 2/
Init 1 root rtd DIR 8, 1 4096 2/
Init 1 root txt REG 150584 654127/sbin/init
Udevd 415 root 0u CHR 1, 3, 0t0 6254/dev/null
Udevd 415 root 1u CHR 1, 3, 0t0 6254/dev/null
Udevd 415 root 2u CHR 6254 0t0/dev/null
Udevd 690 root mem REG 51736 302589/lib/x86_64-linux-gnu/libnss_files-2.13.so
Syslogd 1246 syslog 2 w REG 10187 245418/var/log/auth. log
Syslogd 1246 syslog 3 w REG 10118 245342/var/log/syslog
Dd 1271 root 0r REG 4026532038 0/proc/kmsg
Dd 1271 root 1 w FIFO 409 0t0/run/klogd/kmsg
Dd 1271 root 2u CHR 6254 0t0/dev/null
Each line shows an opened file. If no conditions are specified, all files opened by all processes are displayed by default.
The significance of lsof output column information is as follows:
COMMAND: process name PID: process identifier
USER: process owner
FD: file descriptor. The application identifies the file through the file descriptor. Types such as cwd and txt: file TYPE, such as DIR and REG
DEVICE: Specify the disk name.
SIZE: File SIZE
NODE: Index NODE (the identifier of the file on the disk)
NAME: the exact NAME of the opened file.
You may need to know the detailed usage of the lsof command.
3. fix too when open files exist iton (POSIX)
A. the answer found on the Internet, especially by using baidu, is almost all "ulimit-n" to view the maximum value of fd that can be opened, usually 1024 (meaning up to 1024 requests can be opened). Then
Use "ulimit-n 4096" to increase the limit. Next, let's draw a conclusion that this solution is just a try of luck. For details, let's take a look at the breakdown below.
B. Why does too program open files appear:
As mentioned in section 1st, all the existence of linux (POSIX) is expressed in file format, so the cause of this exception (almost) is that you have opened too many 'files ', exceeds the limit.
The method for tuning the limit in a is obviously usable. The following section provides the reasons for not recommending this operation:
From http://oroboro.com/file-handle-leaks-server/
Wrong Answers, Myths and Bad Ideas
Raise the file handle limit
One common answer to this problem is to just raise the limit of open file handles and then restart the server every
Day or every few hours.
This will delay the problem but likely will not fix it. It is possible that your program is not leaking and has
Legitimate need to hold a large number of file handles. But if your program is designed correctly there usually isn't
A need to keep a large number of handles open-even if you have thousands of simultaneous connections. We'll discuss
Some methods of managing that later.
If this was a good idea the operating system wocould already come configured with a higher file descriptor limit. If
This was necessary, Apache wowould require you to up this limit before running.
Reason (most of the time)
The problem is almost certainly that you are leaking file handles. That is, handles are being opened, and after you are
Done with them they are not closed.
Leaked file handles can come from many sources, not just open files. Some common sources are: Sockets, Pipes, Database
Connections, Windows HANDLES, Files.
C. How to troubleshoot and repair (important)
When the "ulimit-n 4096" solution is not satisfied and you want to analyze the cause in depth, the searched analysis methods are similar, most of which are related to the lsof command, which are listed below;
=. To find out PID for mysqld process, enter: pidof mysqld # The pidof command is used To find the ID of the process. for example, pidof java finds the ID of the Java process
=. List File Opened By a PID: lsof-p $ {pid} # The-p parameter indicates the pid. For example, lsof-p 10086 prints all open files of the 10086 process.
Or ls/proc/$ {pid}/fd # is the same as upstream, view the files opened by this process
=. List File Descriptors in Kernel Memory
Sysctl fs. file-nr # result: fs. file-nr = 2688 0 379264
=> The number of allocated file handles
=> The number of unused-but-allocated file handles
=> The system-wide maximum number of file handles
Sysctl fs. file-max # the maximum number of files that can be opened
=. View files opened by a user: lsof-u jboss
=. Combined with the counter, calculate the number of opened files, such as lsof-p 10086 | wc-l, ls-alt/etc/10086/fd | wc-l, and so on.
Most of these analysis methods use lsof with parameters and pipeline commands, or use the/proc/$ {pid}/fd directory to analyze your target openfiles.
Confusion
=. Lsof-u root | the wc-l result is 2223, while the ulimit-n result is 1024. Why is the root user's current open files larger than limit? If you do not know this
The analysis is meaningless, because you want to solve the open file limit problem, and the result is that the root current runtime opens more files than the limit.
=. Lsof-p 54552 | wc-l the result is 658, 54552 is the pid of my java Process, and ll/proc/54552/fd | wc-l, the result is 358, why is the current process counted?
Is there such a big difference in the number of open files? Which one prevails, and what is the relationship with the ulimit-n value?
=. First explain the 2nd point, lsof will also give you memory mapped. so-files-which technically isn' t the same as a file handle
Application has control over./proc/<pid>/fd is the measuring point for open file descriptors. It means that the result of lsof contains
Memory mapped. so-files, these are not in principle the fd controlled by general applications. The/proc/<pid>/fd directory reflects the open state of fd.
Modify lsof: lsof-p <pid> | grep-v mem | egrep-v '^ command pid' | wc-l, this is equivalent to the statistics in/proc/<pid>/fd.
=. Why is the 2nd point problem caused by 1st points? Use lsof-u root | grep-v mem | egrep-v '^ command pid' | wc-l. The result is 1221 or greater than limit.
That is to say, the root user's current open 1221 files is determined, and the root file limit is 1024, which leads to the reason for more than limit: Actually yes
The limit is applied to the following objects: because the limit is on a per-process base and not per-user. The limit is based on
A process (a process owned by this user) is not a user. That is, ulimit-n is 1024, which means that a process executed by the root user can only open up to 1024 files,
Not the root account can only open 1024 in total.
=. Sysctl fs. file-max my result is 379264. This number is the total number of open files supported by the kenel kernel. This cannot be changed.
Solve
=. It can be explained that it is a luck to enlarge the limit. After all, it is rare for a process to open more than 1024 files at the same time under normal circumstances.
=. If you need to analyze the leak or view the details of open fiels, you should start with the pid granularity instead of being confused by the user.
Lsof-p <pid> | grep-v mem | egrep-v '^ command pid'
You can also analyze the/proc/<pid>/fd directory.
=. Uprize limit
Even if you need to expand the limit, most of them are simply "ulimit-n 4096" on the OS, which is not allowed by the operating system. The following two methods are provided for reference:
Raising the Global Limit. Edit/etc/sysctl. conf and add the following line: fs. file-max = 65536
Apply the changes with: sudo sysctl-p/etc/sysctl. conf
Raising the per-User Limit.
& Edit as root the following system configuration file: % sudo vi/etc/security/limits. conf
& Modify the values for nuxeo user (we assume here JBOSS is launched with the sytem user "nuxeo ")
Nuxeo soft nofile 4096
Nuxeo hard nofile 8192
If you want to raise the limits for all users you can do instead:
* Soft nofile 4096
* Hard nofile 8192
& Edit/etc/pam. d/su: sudo vi/etc/pam. d/su
& Uncomment the line:
Session required pam_limits.so
& Once you save file, you may need to logout and login again
D. Summary
The operation to expand the limit on the number of open files can be effective, but before that, you should also be interested in the cause of the error. You may wish to analyze it first.
The above points are collected from different places on the Internet and obtained from personal understanding. If there are any mistakes, please correct them.