MyCAT Log Analysis

Last Update:2016-01-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

MyCAT Log Analysis

MyCAT logs cannot be obtained for running information about MyCAT. For example, if MyCAT adopts read/write splitting, how does MyCAT execute a query statement, and which node each shard is distributed.

The default value is the info level. You can set the debug level through log4j. xml to obtain more internal information about MyCAT running.

Next, we will analyze the MyCAT running process by starting MyCAT and executing a query statement.

Here We Go!

Mycat startup log input is as follows:

Processors refers to the number of CPU cores, and aio thread pool size refers to the AIO thread pool, which is twice the number of CPU Cores

-Sysconfig params: it refers to the startup parameter of mycat, which can be modified in server. xml.

IdleTimeout = 1800000 indicates that the maximum idle connection time between an external application and mycat is 30 min. If the idle connection time exceeds 30 min, the connection from the external application to mycat is disconnected.

DataNodeHeartbeatPeriod = the heartbeat detection frequency of 10000 nodes is 10 seconds.

9066 is the Management port of MyCAT

8066 is the default port of MyCAT, similar to MySQL port 3306.

Next, initialize the MyCAT connection pool. The default value is at least 10 connections.

It can be defined at dataHost in schema. xml.

-Init backend myqsl source, create connections total 10 for hostM1 index: 0: Initialize the backend MySQL connection. The master node is hostM1, which can be specified according to the dnindex. properties file in the conf directory.

The following output information is used to create a connection and release the channel.

After the connection is created, the following information is output, indicating that the connection is created.

Perform heartbeat detection, so you need to establish a new connection. Because I configured a master node and two slaves, you need to determine the heartbeat of localhost, 192.168.244.144, and 192.168.244.146. localhost has started 10 connections, therefore, you do not need to create a new connection, but it is required for hostS1 (192.168.244.146) and hostS2 (192.168.244.144.

ThreadID corresponds to the process ID in MySQL

Next, the output information of the front-end connection to MyCAT. It can be seen that after the connection to MyCAT, the show database, show tables, select @ version_comment limit 1 command will be executed. In my opinion, this is used to analyze the cached database and table information for subsequent routes.

Next, run select * from travelrecord to view the internal implementation logic of MyCAT.

First, the route information of the SQL statement is determined. (according to the following output, MyCAT uses a map set to store the route information of the SQL statement, the route information is generated first.

The main function of routing is to determine the execution path of the SQL statement, including the shards of the queried data.

-SQLRouteCache miss cache: No route information

-SQLRouteCache add cache: Generate route information

-Execute mutinode query select * from travelrecord: confirm that the query will span Multiple shards.

-Has data merge logic: data merging

Next, you can confirm which node to read.

Because I configured a master node and two slaves, and balance = "1", all readHost and stand by writeHost are involved in the load balancing of the select statement, so hostS1 (192.168.244.146) and hostS2 (192.168.244.144) are involved in the load balancing of the select statement.

Before executing the select statement, you need to synchronize connections. Why do you need to synchronize connections?

To efficiently use backend MySQL connections, MyCAT adopts the Connection Pool Mode Based on MySQL instances.

In the Mycat connection pool, the currently available MySQL connections are put in a HashMap data structure, the Key is the Database corresponding to the current connection, and there is also a level-2 classification, that is to say, the connection is automatically submitted or manually submitted. This is designed to efficiently query available connections. The specific logic is as follows:

When a user session needs an automatically submitted SQL connection to the shard dn1 (corresponding to db1), the connection pool first checks whether there are available connections on db1. If yes, check whether there is a connection in the automatic submission mode. If yes, return. Otherwise, return the connection in the manual submission mode on db1. if no connection is available for db1, then, the system randomly returns the available connection corresponding to another database. If no available connection is available and the connection pool has not reached the upper limit, a new connection is created and a result is returned. In this logical process, we will find that, the connection obtained by the user session may not be what he originally wanted. For example, the Database does not match or the transaction mode does not match. Therefore, before executing a specific SQL statement, there is a process of automatically synchronizing the Database connection, it includes four indicators: transaction isolation level, transaction mode, Character Set, and Database. After synchronization is completed, specific SQL commands are executed.

When executing the select * from travelrecord statement, because three shards are involved, MyCAT selects three read sources for query. Two of them are on hostS2 and the other is on hostS1. For the two connections of hostS2, one of them will reuse the connection before MyCAT to perform heartbeat detection on hostS2, And the other needs to be created.

For hostS2, reuse the previous connection. Because db1 is used for heartbeat detection, the schema change is set to false and does not need to be modified.

For hostS1, the previous connection is reused because db1 is used and db3 is used. Therefore, the schema change is true and needs to be modified.

For each synchronous connection, the query starts only after the connection is synchronized.

There are a total of five sorted ed OK response, two of which correspond to total syn cmd 2 commands, and the other three correspond to total syn cmd 3 commands, which can be easily matched according to the host.

Run the following program,

The first red box is for the dn3 partition in 192.168.244.146, because I only have two rows in the entire select * from travelrecord output,

mysql> select * from travelrecord;+---------+---------------+------------+------+------+| id      | user_id       | traveldate | fee  | days |+---------+---------------+------------+------+------+|       1 | mysql-server2 | 2016-01-01 |  100 |   10 || 5000001 | mysql-server3 | 2016-01-02 |  100 |   10 |+---------+---------------+------------+------+------+2 rows in set (0.04 sec)

There are only two shards, dn1 and dn2. Therefore, the dn3 query has no result output and the connection and channel are directly released.

The blue box and the second red box are related to dn1 in 192.168.244.144, with result output.

-Field metadata inf: DataMergeService first merges the information of the output column.

-On row end reseponse: indicates the output of dn1 results.

The following figure shows the dn2 partition output. Because the dn2 partition is a newly created connection, synchronization is also required. After receiving the received ed OK response twice, the system returns the result and releases the connection and channel.

Finally, DataMergeService merges all returned results.

The last packet id is expected to be returned to the previous program by MyCAT.

So far, the log analysis of a query statement is complete ~

Summary:

1. In fact, synchronous connections do not necessarily generate the sorted ed OK response information. For example, when I run the SELECT * FROM travelrecord WHERE id = 1 Statement,

After the synchronous connection is executed, the connection and channel are released directly, and no received ed OK response information or on row end reseponse is generated.

It seems that only the source code can be used to identify the cause.

2. during Normal Analysis of MyCAT logs, the tailf command is generally used, but frequent heartbeat detection information is indeed an interference. You can use tailf mycat. log | grep-Ev "Timer | release channel" filters out some information.

3. Synchronized ed OK response is not only for connection synchronization, but also generated when the drop table operation is executed.

Therefore, the received OK response should be a feedback of the execution results.

This article permanently updates the link address:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

MyCAT Log Analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

MyCAT Log Analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support