Mixer: mysql Protocol Analysis

Source: Internet
Author: User

Summary

To implement a mysql proxy, you must first understand and implement the mysql communication protocol. In this way, the proxy can be used to build a bridge between the client and the server.

Mixer mysql protocol implementation mainly refer to mysql's official internal manual and use Wireshark for verification at the same time. In the process of implementation, of course, there are a lot of pitfalls. Here we will record it as a summary of the protocol analysis.

It should be noted that mixer does not support all mysql protocols, such as backup and stored procedures, mainly because of limited effort and simplicity.

Data Type

Mysql only has two basic data types: integer and string.

Integer

Integer includes fixed length integer and length encoded integer. For length encoded integer, there are many places to use.

For an integer, we convert it to length encoded integer as follows:

  • If the value is <251, 1 byte is used.
  • If the value is greater than or equal to 251 and the value is less than 2 ** 16, use fc + 2 byte
  • If value> = 2 ** 16 at the same time value <2 ** 24, use fd + 3 byte
  • If value> = 2 ** 24 simultaneous value <2 ** 64, use fe + 8 byte

Correspondingly, for a length encoded integer, we can convert the value of the first byte to the corresponding integer.

String

String includes:

  • Fixed length string, fixed length string
  • Null terminated string, string ending with null
  • Variable length string, which is determined by another value.
  • Length encoded string, which is determined by the START length encoded integer
  • Rest of packet string, string from the current position to the end of the package
Packet

In mysql, if the client or server needs to send data, it needs to split the data into packet according to (2 ** 24-1), add the header to each packet, and then send it as a result.

For a packet, the format is as follows:

3              payload length1              sequence idstring[len]    payload

The preceding three bytes indicate the length of the packet. Each packet cannot exceed 16 MB. The first byte indicates the serial number of the packet. It starts from 0 and increases progressively for multiple packages. It is reset to 0 only when the next command sends data. The first four bytes constitute a packet header, followed by the actual data of the packet.

Because a packet can send up to 16 MB data, if you want to send data larger than 16 MB, You need to split it into multiple packages for sending.

Generally, the server returns three types of packages to the client.

  • OK Packet. Operation successful
  • Err Packet, operation failed
  • EOF Packet, end of file
Login Interaction

To implement proxy, you must first solve the login problem, including the proxy simulating the server to process client Login, and the proxy simulating client login to the server.

For simplicity, mixer only supports username + password login, which should be the most common login method. Ssl and compression are not supported at the same time.

A complete login process is as follows:

  • The client first connects to the server.
  • The server sends initial handshake packet, including the supported capability and a random salt for encryption.
  • The client returns the handshake result, including the capability supported by the client and the password encrypted with the salt.
  • Server verification. If successful, OK packet is returned; otherwise, err packet is returned and connection is closed.

Here, I have to say that the login protocol is a big pitfall, because I am using the HandshakeV10 Protocol. In this document, the Protocol has such a provision:

if capabilities & CLIENT_SECURE_CONNECTION {    string[$len]   auth-plugin-data-part-2 ($len=MAX(13, length of auth-plugin-data - 8))}

According to the instructions in the document, the length of the auth-plugin-data-part-2 is calculated as 13, because the length of auth-plugin-data is 20. However, the actual situation is that the length of the auth-plugin-data-part-2 should be 12, 13th BITs have been 0. Only in this way can we calculate the correct encryption password Based on salt. This is the case in Wireshark Analysis in mysql-proxy official documents and multiple msyql client drivers. in go-SQL-driver, the author has directly written the following comments:

// second part of the password cipher [12? bytes]// The documentation is ambiguous about the length.// The official Python library uses the fixed length 12// which is not documented but seems to work.

It is conceivable that there are many pitfalls in this trap. At least I first planted it. Encryption is always incorrect.

Command

After logging in, the rest is mysql command support, while mixer only implements basic commands. It is mainly concentrated in text protocol and prepared statment.

COM_PING

The most basic ping implementation is used to check whether mysql is alive.

COM_INIT_DB

Although it is called init db, it is actually the same as using db to switch to using db.

COM_QUERY

This is the most important command. Most mysql statements we use in the command line are sent through this command.

In COM_QUERY, mixer mainly supports select, update, insert, delete, replace, and other basic operation statements. It also supports begin, commit, and rollback operations, set names and set autocommit are also supported.

4 of COM_QUERY returns packet

  • OK Packet
  • Err Packet
  • Local In File (not supported)
  • Text Resultset

Here we will focus on text resultset, because it contains the most commonly used select result set.

A text resultset contains the following packages:

  • A column-count packet encoded with length encoded integer
  • Column-count columns define packet
  • Eof packet
  • One or more row packages. Each row packet has column-count data.
  • Eof packet or err packet

We can obtain the data in a row packet in the following way:

  • If the value is NULL, It is 0xfb.
  • Otherwise, all values are represented by length encoded string.
COM_STMT _*

The COM_STMT _ family protocol is the common prepared statement. When I say prepared statement is supported in the atlas Group, many people think that I support the prepare used in COM_QUERY, execute and deallocate prepare statements. In fact, these two are quite different.

Why do I not want to support the prepare of COM_QUERY now? The main reason is that this prepare requires variable settings. mixer maintains a connection pool with the server on the backend, so for the variables set by the client, proxy is very troublesome to maintain, and all the variables need to be reset every time a new connection is used with the server, which increases the complexity. Therefore, I do not support variable settings. The same is true for cobar. Since variables are not supported, I will not support the prepare of COM_QUERY.

COM_STMT _ * this group of commands is mainly used in client driver in various languages. Therefore, I think only this prepare is enough.

For the returned results of COM_STMT_EXECUTE, because the prepare statement may be select, binary resultset and binary resultset are returned, which are similar to the preceding text resultset, the only thing to note is that row packet uses binary row packet.

For each binary row packet, the first byte is 0, followed by a null bitmap, and then the actual data.

In binary row packet, null bitmap is used to indicate that data in a column of the row is NULL. The length of null bitmap is calculated by column-count + 7 + 2)/8. If the value of NULL for each column is null, its position in the null bitmap is calculated as follows:

NULL-bitmap-byte = ((field-pos + offset) / 8)NULL-bitmap-bit  = ((field-pos + offset) % 8)

Offset is 2 in binary resultset, and field-pos is the position of this column.

The actual non-NULL data is obtained based on the data type defined in each column. For example, if the type is MYSQL_TYPE_LONGLONG, the data value is 8 bytes in length. If the type is MYSQL_TYPE_STRING, the data value is a length encoded string.

Postscript

I have analyzed some mysql protocols through Wireshark, mainly here, and I have to strongly recommend wireshark here, which enables me to get twice the result with half the effort in learning mysql protocols.

Here is the mixer code: https://github.com/siddontang/mixer. welcome to the feedback.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.