Copyright notice: This article by Wang Liang original article, reprint please indicate source:
Article original link: https://www.qcloud.com/community/article/214
Source: Tengyun https://www.qcloud.com/community
Phenomenon
The long-running operation found that the disk full of the flume cluster was deployed and was found to be caused by the Flume log directory.
Specific questions
Specifically, Flume's large file log found that a MySQL-related sink continues to throw an exception, printing a large number of logs
Analysis process
According to this exception information (exception) is:
com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: No operations allowed after statement closed
It literally means that the state of the MySQL service (connection) has been closed, there are still commit transaction operations, throws an exception, but this exception continues to throw, still need in-depth analysis.
Configuration analysis
Since it is flume thrown, and is related to MySQL, that narrows the scope of the problem, looking for flume who is writing MySQL. (The flume configuration is typically located in/etc/flume/conf/agent/flume.conf)
Depending on the configuration, the only MySQL-related configuration logic: Read the log of the hiveserver, filter the SQL statements (in metadata collec* filter), the results are stored in the sink configuration of the MySQL data table hive_run_sqlinfo.
Flumeagent Logic Analysis
The above sink calls a Com.tencent.tbds.flume.sink.MysqlSinkForMetadata class, which is a custom class, where we find the jar of the class in the reference path and decompile it (Decompiler), the basic logic and comments are as follows:
Sink initialization phase
Sink Loop Execution Phase
Sink shutdown phase
The close phase simply checks to see if the connection exists.
Possible causes
From the logic of sink, only in the case of an empty connection, the sink state will be backoff, in other cases the state is ready, and before and after committing the transaction to MySQL, the connection state is not checked, even if the SQL throws an exception without modifying the sink state. Causes the commit to throw an exception after the sink loop executes and the loop throws the exception. Here is the root of the constant throw exception. So when did the connection actually shut down? There are 2 reasons for this: (1) The sink has no interaction with MySQL for a long time, over the connection auto-shutdown time, and (2) MySQL's abnormal shutdown.
Issue Confirmation
Whether sink is not interacting with MySQL for a long time
The timeout configuration for querying MySQL is as follows:
Configured as the default configuration for 28,800 seconds, or 8 hours.
To view the logs for Hiveserver, count the number of SQL executions per hour as follows:
As can be seen, the disconnection between sink and MySQL is not a long-term no interaction.
Whether the service is artificially disconnected
The time for the query to start MySQL is as follows:
The exception time of the flume is as follows: (from the time of the transaction itself content of the exception submission):
Time fits.
Conclusion
The MySQL service exception caused Flume to commit the transaction when the connection was interrupted, and Flume did not handle the exception, causing the dead loop to commit the transaction, and in this exceptional case, Flume was not working properly.
Problem recurrence
Based on the above inference, this exception can be verified as follows:
Hiveserver Generating logs
Perform multiple hivesql in hue
Manually force shutdown of MySQL
Manually restart the MySQL instance written by Flume.
View Flume Performance
Flume enters an infinite loop that throws an exception state, verifying success.
Summarize
The main reason here is the chain reaction caused by the MySQL service exception. Expediency can commit a transaction exception in sink code, modify the state of the next sink to Back.off, prevent the continuous printing of the log causes the machine disk full impact other services (to be verified).
Troubleshooting process of flume anomaly based on TBDs