When the master is down, the reason for the slow memory growth caused by the continuous retry of pt-heartbeat and the solution, masterpt-heartbeat
Recently, my colleagues reported that when using pt-heartbeat to monitor master-slave replication latency, if the master is down, the pt-heartbeat connection will fail, but will continue to retry.
Retry is understandable. From the user's perspective, we hope that pt-heartbeat can be retried until the database is reconnected. However, they found that continuous retries will lead to slow memory growth.
Reproduction
Environment:
Pt-heartbeat v2.2.19, MySQL Community edition v5.6.31, Perl v5.10.1, RHEL 6.7, memory 500 M
To avoid the impact of database start/stop on pt-heartbeat memory usage, MySQL and pt-heartbeat run on different hosts respectively.
Run pt-heartbeat
# Pt-heartbeat -- update-h 192.168.244.10-u monitor-p monitor123-D test -- create-table
Monitor pt-heartbeat memory usage
Get pid
# ps -ef |grep pt-heartbeatroot 1505 1471 0 19:13 pts/0 00:00:08 perl /usr/local/bin/pt-heartbeat --update -h 192.168.244.10 -u monitor -p monitor123 -D test --create-tableroot 1563 1545 2 19:50 pts/3 00:00:00 grep pt-heartbeat
View the memory usage of the process
# Top-p 1505
Running at. 00 (TIME + column), MEM remains stable at 3.3%
Close database now
# Service mysqld stop
The pt-heartbeat command output the following information continuously.
After the same CPU time, MEM increased to 4.4% and increased by 1%. Considering the memory usage of 500 mb, the memory usage of the process increased by 5 MB, although not many, but considering that the increase in the memory of the process does not mean to stop, this phenomenon still deserves attention.
At the same time, through the pmap command, we found that the RSS and Dirry of 0000000001331000 addresses also increased, with a growth rate of 4 k/s.
Later I studied the source code of pt-heartbeat and found that the code was a little bug.
my $tries = 2;while ( !$dbh && $tries-- ) {PTDEBUG && _d($cxn_string, ' ', $user, ' ', $pass,join(', ', map { "$_=>$defaults->{$_}" } keys %$defaults ));$dbh = eval { DBI->connect($cxn_string, $user, $pass, $defaults) };if ( !$dbh && $EVAL_ERROR ) {if ( $EVAL_ERROR =~ m/locate DBD\/mysql/i ) {die "Cannot connect to MySQL because the Perl DBD::mysql module is ". "not installed or not found. Run 'perl -MDBD::mysql' to see ". "the directories that Perl searches for DBD::mysql. If ". "DBD::mysql is not installed, try:\n". " Debian/Ubuntu apt-get install libdbd-mysql-perl\n". " RHEL/CentOS yum install perl-DBD-MySQL\n". " OpenSolaris pgk install pkg:/SUNWapu13dbd-mysql\n";}elsif ( $EVAL_ERROR =~ m/not a compiled character set|character set utf8/ ) {PTDEBUG && _d('Going to try again without utf8 support');delete $defaults->{mysql_enable_utf8};}if ( !$tries ) {die $EVAL_ERROR;}}}
The above code is taken from the get_dbh function to obtain the database connection. If the acquisition fails, retry once and exit unexpectedly through the die function.
However, by setting the following breakpoint, we can find that when $ tries is 0, the PTDEBUG & _ d ("$ EVAL_ERROR") Statement in the if function can be executed, however, the die function does not throw an exception and exits the script.
PTDEBUG && _d($tries);if ( !$tries ) {PTDEBUG && _d("$EVAL_ERROR"); die $EVAL_ERROR; }
Later, modify the last if function of the above Code as follows:
if ( !$tries ) {die "test:$EVAL_ERROR";}
Test again
Start Database
# Service mysqld start
Run the pt-heartbeat command
# Pt-heartbeat -- update-h 192.168.244.10-u monitor-p monitor123-D test -- create-table
Stop Database
# Service mysqld stop
The pt-heartbeat command just executed exits unexpectedly.
"Test:" Is the added test character.
Conclusion
It is strange that only the simple die $ EVAL_ERROR will not throw an exception and exit the script, but the modified die "test: $ EVAL_ERROR" will exit the script.
Obviously, this is indeed a bug. I don't know if it is related to the perl version.
Curious, how does a failed connection result in memory growth?
Finally, a bug was raised to percona.
Https://bugs.launchpad.net/percona-toolkit/+bug/1629164
The above section describes the reasons and solutions for the slow memory growth caused by the continuous retry of pt-heartbeat after the master is down. I hope this will be helpful to you, if you have any questions, please leave a message. The editor will reply to you in time!