When the master is down, the reason for the slow memory growth caused by the continuous retry of pt-heartbeat and the solution, masterpt-heartbeat

Source: Internet
Author: User
Tags install perl percona

When the master is down, the reason for the slow memory growth caused by the continuous retry of pt-heartbeat and the solution, masterpt-heartbeat

Recently, my colleagues reported that when using pt-heartbeat to monitor master-slave replication latency, if the master is down, the pt-heartbeat connection will fail, but will continue to retry.

Retry is understandable. From the user's perspective, we hope that pt-heartbeat can be retried until the database is reconnected. However, they found that continuous retries will lead to slow memory growth.

Reproduction

Environment:

Pt-heartbeat v2.2.19, MySQL Community edition v5.6.31, Perl v5.10.1, RHEL 6.7, memory 500 M

To avoid the impact of database start/stop on pt-heartbeat memory usage, MySQL and pt-heartbeat run on different hosts respectively.

Run pt-heartbeat

# Pt-heartbeat -- update-h 192.168.244.10-u monitor-p monitor123-D test -- create-table

Monitor pt-heartbeat memory usage

Get pid

# ps -ef |grep pt-heartbeatroot 1505 1471 0 19:13 pts/0 00:00:08 perl /usr/local/bin/pt-heartbeat --update -h 192.168.244.10 -u monitor -p monitor123 -D test --create-tableroot 1563 1545 2 19:50 pts/3 00:00:00 grep pt-heartbeat

View the memory usage of the process

# Top-p 1505

Running at. 00 (TIME + column), MEM remains stable at 3.3%

Close database now

# Service mysqld stop

The pt-heartbeat command output the following information continuously.

After the same CPU time, MEM increased to 4.4% and increased by 1%. Considering the memory usage of 500 mb, the memory usage of the process increased by 5 MB, although not many, but considering that the increase in the memory of the process does not mean to stop, this phenomenon still deserves attention.

At the same time, through the pmap command, we found that the RSS and Dirry of 0000000001331000 addresses also increased, with a growth rate of 4 k/s.

Later I studied the source code of pt-heartbeat and found that the code was a little bug.

my $tries = 2;while ( !$dbh && $tries-- ) {PTDEBUG && _d($cxn_string, ' ', $user, ' ', $pass,join(', ', map { "$_=>$defaults->{$_}" } keys %$defaults ));$dbh = eval { DBI->connect($cxn_string, $user, $pass, $defaults) };if ( !$dbh && $EVAL_ERROR ) {if ( $EVAL_ERROR =~ m/locate DBD\/mysql/i ) {die "Cannot connect to MySQL because the Perl DBD::mysql module is ". "not installed or not found. Run 'perl -MDBD::mysql' to see ". "the directories that Perl searches for DBD::mysql. If ". "DBD::mysql is not installed, try:\n". " Debian/Ubuntu apt-get install libdbd-mysql-perl\n". " RHEL/CentOS yum install perl-DBD-MySQL\n". " OpenSolaris pgk install pkg:/SUNWapu13dbd-mysql\n";}elsif ( $EVAL_ERROR =~ m/not a compiled character set|character set utf8/ ) {PTDEBUG && _d('Going to try again without utf8 support');delete $defaults->{mysql_enable_utf8};}if ( !$tries ) {die $EVAL_ERROR;}}}

The above code is taken from the get_dbh function to obtain the database connection. If the acquisition fails, retry once and exit unexpectedly through the die function.

However, by setting the following breakpoint, we can find that when $ tries is 0, the PTDEBUG & _ d ("$ EVAL_ERROR") Statement in the if function can be executed, however, the die function does not throw an exception and exits the script.

PTDEBUG && _d($tries);if ( !$tries ) {PTDEBUG && _d("$EVAL_ERROR"); die $EVAL_ERROR; }

Later, modify the last if function of the above Code as follows:

if ( !$tries ) {die "test:$EVAL_ERROR";}

Test again

Start Database

# Service mysqld start

Run the pt-heartbeat command

# Pt-heartbeat -- update-h 192.168.244.10-u monitor-p monitor123-D test -- create-table

Stop Database

# Service mysqld stop

The pt-heartbeat command just executed exits unexpectedly.

"Test:" Is the added test character.

Conclusion

It is strange that only the simple die $ EVAL_ERROR will not throw an exception and exit the script, but the modified die "test: $ EVAL_ERROR" will exit the script.

Obviously, this is indeed a bug. I don't know if it is related to the perl version.

Curious, how does a failed connection result in memory growth?

Finally, a bug was raised to percona.

Https://bugs.launchpad.net/percona-toolkit/+bug/1629164

The above section describes the reasons and solutions for the slow memory growth caused by the continuous retry of pt-heartbeat after the master is down. I hope this will be helpful to you, if you have any questions, please leave a message. The editor will reply to you in time!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.