Memory leak analysis method for Erlang project _erlang

Source: Internet
Author: User
Tags message queue setcookie

As projects become increasingly reliant on Erlang, the problems that come up with it increase. The previous time line system encounters the memory high consumption problem, records the troubleshooting analysis process. The online system uses the Erlang R16B02 version.

Problem description

There are several online systems that run for some time and memory soars. The system model is very simple, there is network connection, pool to find a new process to deal with. Top command observation, found that the memory was eaten by the Erlang process, netstat command to view the number of network connections, only a few k. The problem should be a Erlang memory leak.

Analysis method

The Erlang system has the advantage of being able to go directly to the online system and analyze problems on the production site. Our system is managed through rebar and can be used in different ways into the online system.

Log on this machine

You can log on to the online machine directly and then attach to the Erlang system with the following command

Copy Code code as follows:

$ cd/path/to/project
$ rel/xxx/bin/xxx Attach
(Node@host) >

through remote shell

Get Cookies for Erlang systems

Copy Code code as follows:

$ ps-ef |grep beam percent percent find parameters--setcookie

Open a new shell, use the same cookie, different nodename
Copy Code code as follows:

$ erl--setcookie cookiename-name test@127.0.0.1

enter the system with the start remote shell
Copy Code code as follows:

Erlang r16b02 (erts-5.10.3) [source] [64-bit] [smp:2:2] [async-threads:10] [hipe] [Kernel-poll:false]

Eshell V5.10.3 (abort with ^g)
(test1@127.0.0.1) 1> net_adm:ping (' node@127.0.0.1 ').
Pong
(test1@127.0.0.1) 2> nodes ().
[' node@127.0.0.1 ']
(test1@127.0.0.1) 3>
User Switch Command
--> h
c [NN]-connect to Job
I [NN]-interrupt job
K [nn]-kill job
J-list All Jobs
s [Shell]-start local shell
R [node [Shell]]-Start remote shell
Q-quit Erlang
? | H-this message
--> R ' node@127.0.0.1 '
--> J
1 {Shell,start,[init]}
2* {' node@127.0.0.1 ', shell,start,[]}
--> C 2

Analysis process

Erlang has many tools to analyze system information, such as Appmon,webtool. However, the system memory is seriously insufficient, there is no way to start these tools, fortunately there are Erlang shell.

The Erlang shell takes a lot of useful commands and can use the Help () method to view

Copy Code code as follows:

> Help ().

Erlang system memory consumption

Top results show a memory problem, so the first step is to look at the system memory consumption in Erlang

Copy Code code as follows:

> Erlang:memory ().

Memory () can see Erlang Emulator allocated memory, total memory, atom consumed memory, process consumed memory, and so on.

Number of Erlang process creation

The online system found that the main memory consumption was on the process, and the next thing to analyze was a process memory leak or too much process creation.

Copy Code code as follows:

> Erlang:system_info (process_limit). Percent% view system can create up to how many process
> Erlang:system_info (process_count). Percent% The current system creates how many process

System_info () returns some information about the current system, such as the number of system Process,port. Execution of the above command, surprised, only 2,3k network connection, the result Erlang process already has more than 10 W. The system process was created, but the heap was not released because of code or other reasons.

View information for a single process

Since the process is piling up for some reason, it's only for reasons in the process.

To get the PID of the stacking process first

Copy Code code as follows:

> I (). Percent% return System Information
> I (0,61,886). Percent% (0,61,886) is PID

See a lot of process hang there, look at the specific PID information, found message_queue several messages have not been processed. Here's a powerful Erlang:process_info () method that can get quite a lot of information from the process.
Copy Code code as follows:

> Erlang:process_info (PID (0,61,886), current_stacktrace).
> RP (Erlang:process_info (PID (0,61,886), BackTrace)).

When you view the backtrace of a process, the following information is found
Copy Code code as follows:

0x00007fbd6f18dbf8 return addr 0x00007fbff201aa00 (GEN_EVENT:RPC/2 + 96)
Y (0) #Ref <0.0.2014.142287>
Y (1) Infinity
Y (2) {sync_notify,{log,{lager_msg,[], ...}}
Y (3) <0.61.886>
Y (4) <0.89.0>
Y (5) []

The process hang lived when it handled the Log Library lager for Erlang's third party.

Problem reason

View the lager documentation and discover the following information

Copy Code code as follows:

Prior to Lager 2.0, the gen_event at the core of lager operated purely in synchronous mode. Asynchronous mode is faster, but has no protection the against message queue overload. In Lager 2.0, the gen_event takes a hybrid approach. It polls its own mailbox size and toggles the messaging between, synchronous and asynchronous on depending size.

{async_threshold, {Async_threshold_window, 5}

This would use Async messaging until the mailbox exceeds and at messages point which synchronous would be messaging D Switch back to asynchronous and when size reduces to 20-5 = 15.

If you are wish to disable this behaviour, simply set it to ' undefined '. It defaults to a low number to prevent the mailbox growing rapidly beyond the limit and causing. In general, the lager should process messages as fast as they come in, so getting behind to should relatively a Nyway.


Originally lager has a configuration item, configure the message unhandled quantity, if the message backlog is exceeded, it will be handled in a synchronous manner!

The current system has the debug log turned on, and the flood log has washed out the system.

Foreigners also encounter similar problems, this thread to our analysis to bring a lot of help, thank you.

Summarize

Erlang provides a wealth of tools for online access to the system, on-site analysis of problems, which is very helpful for efficient, fast positioning problems. At the same time, the powerful Erlang OTP gives the system a more stable guarantee. We will continue to tap Erlang and look forward to more hands-on sharing.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.