[Turn]mysql Binlog in realtime

Source: Internet
Author: User
Tags benchmark message queue amq

Original: http://guweigang.com/blog/2013/11/18/mysql-binlog-in-realtime/

As is known to all, MySQL is the most popular Internet database (not one) ——————— for open source. In the early stages of development, many companies benefited from its ease-of-use and economy. With the growth of these companies, more and more companies into the development of MySQL, so the characteristics of MySQL more and more rich, such as: different characteristics of the storage engine, Binlog master-slave replication scheme.

Today we are going to talk about how to parse MySQL Binlog in real time and the huge business value it brings. I can say a lot of applications in a flash, such as: Master-slave replication, cache system, retrieval engine, etc. As shown, MySQL's binlog can be used not only for master-slave replication between MySQL services, but also for other non-MySQL services:

In a mature business system we have and only one correct full amount of data (this refers to MySQL data). In the face of the current application of rich features and the user's increasingly high requirements, MySQL can not always be very good to meet us. For example: We need to use the caching system to speed up the service, need a complete retrieval system to provide search services, need to trigger the event to send notifications in real time. But the problem before us is that there is only one piece of data, and we are working directly with MySQL (this article uses MySQL to represent all the database systems), how can I have the same data for other systems except MySQL? And we always want the data in MySQL to be as small as the data in other systems, the better, the more real-time. Of course, we may have a series of solutions to solve this problem, the most typical of which are two kinds of methods:

    • Double Write, when you update MySQL while updating the cache system, update the search engine, triggering events to send notifications. This does solve the problem, but because it is heterogeneous, there is no mechanism to guarantee that the write will succeed. This means that as time goes on, data differences will grow. Moreover, if there are many external systems, doing so will also affect the response time of the primary business logic.
    • Timing tasks, in the main business logic coupled to update the code of other systems is not OK, the external system only a few rows, if you want to update 100 external systems, when to deal with it? Doing this asynchronously with timed tasks seems like a better solution. This is true, but how often does the task execute? Every 1 minutes? Every 10 minutes? Of course, we can set this frequency according to business needs, at first glance, no problem, but think about it, the timing of the task is specifically used to do?
      • Timed export full amount: When the amount of data is relatively small, this is the most important method, of course, each of the downstream must clear their own data re-full import, the amount of data slightly larger cost is particularly large.
      • A business system maintains a queue (a new table in the business library is treated as a queue): When manipulating data in a business system, it writes the logs of its operations to the queue and then consumes the queue downstream.

        This approach is maintained by the business system to maintain operational records, the benefit is to ensure the business integrity of the data, the disadvantage is that the business side coupled with non-business-related logic, whenever the data changes are required to open transactions to ensure that business operations and its operations log can be correctly written. This is dangerous if you want to operate the db directly for some reason later, and it is not possible to simulate complex business logic calculations and relationships when operating the DB.

      • Timed sweep table to find the change record: Each data will have a updatetime (on Update current_timestamp) field, when the record is modified, MySQL will implicitly update this field for the current time. This approach allows the business side to stay away from fire and fire, processing increments in a more decoupled way, ideal for scenarios such as stateless cache updates, and, of course, slow queries if the number of table records is too large. However, if there is a state-flowing data, this way will lose the state flow direction.

If you use queue mode, you will probably write the following code:

= getdi(),get("test_db");  $db, begin();  $dbUpdate("//Logic operate");  Insert("//Operate Logs") , $db $db, Commit();              

And it's likely that the Operation log table you've designed looks like this:

event_id user_id object_id Level Event_type Mcid Addtime
152
153
154
48
48
48
3007209739
3007209739
3007209739
200
202
201
23
24
24
0
0
0
2013-07-17 04:06:05
0000-00-00 00:00:00
0000-00-00 00:00:00

If you sweep the table, you may write the following sql:

*' products '>' 2013-11-12 12:00:00 ';    

Let me take a detailed description of our application scenarios and how we use MySQL Binlog to solve this problem:

This is the architecture of our entire system, the left side can be viewed as a user system, and the right is a business system. After the browser sends a request to the front-end Search service cluster (the red part of the image), the front-end cluster will send the user's request data to many backend services, and the different backend is not isomorphic, so there will basically be a "long connection/Protocol conversion Agent Layer" (the red part of the figure), Then from the back-end to determine whether to meet user needs, and finally from the front-end cluster splicing from the results of different backend services and presented to netizens.

Seemingly a massive access to the service, in fact, is composed of countless small services, shown in the figure is our group to do a small service, in the specific scenario of the blue part of the thing is to the front-end service cluster protocol into the FASTCGI protocol, and forwarded to the backend service ui/php-fpm. Although the name is called UI, it is not the work of the UI that we understand, but it is the first part of the subsequent retrieval and data flow, so it is called the UI. The work it undertakes is to parse the data sent from the agent layer and construct the structured query condition according to the request data and parameters, of course, the construction process may still need to request many external services, and then make a request to the retrieval System (BS), BS returns a collection of entity IDs (which you can understand as commodity IDs), The UI queries entity details based on the entity ID to the database (where the cache system is omitted), which is the whole process.

Our architecture also has the heterogeneous system described above, the BS (Basic Search) module is the two development of Lucene, it requires the data in DB to index, and the customer will occasionally write some data into the DB, in order to let the UI in real time to retrieve new data from BS, All changes to the DB are reflected in the BS in real time, which cannot be done with a dual write/timer task. So we developed an asynchronous event framework (ADPIPE) based on MySQL Binlog.

The adpipe consists of the following parts:

    • BIZ Framework: Everyone knows that PHP is the fastest, so it is often necessary to change the logic of the event using PHP to write.
    • PHP Ext: PHP Extensions are inevitably required in order to write Binlog event logic in PHP user space.
    • Binlog listener: Connect with MySQL server to implement Binlog negotiation.

Take a look at the code for the Biz framework:

The first is the startup script

The main responsibility of the above code is to connect to MySQL Server, and set the location of the Binlog, after establishing the connection according to the received event call Binlogevent corresponding method. Binlogevent all methods only return some status codes (IGNORE, SUCCESS, L_exit, Signal_quit), which control how the message is handled.

Next is the event handling class, Binlogevent:

Look at line 171, the Gethandleclass method handles the event by acquiring a business object, and if no class is found to handle a DB event, the default is to call DefaultEvent to handle it.

Please look at the DefaultEvent class:

Because the Binlog event is closely related to the scheme of the DB, the model class is also called in defaultevent to get the table field. Each event-handling class has three methods, Onwrite, OnUpdate, OnDelete, you can get the DB changed data in the method, then make various transformations according to the business requirements, and then package the message.

Next comes the message class, where the message is a logical message, a message that is ready to be sent to BS after the business code transformation.

The above is the Adpipe system in the Biz Framework code, of course code to be able to run, need to install Binlog Listener and Php-binlog extension, please click the link below to view:

    • PHP Binlog:https://github.com/bullsoft/php-binlog
    • MySQL Replication Listener:https://github.com/bullsoft/mysql-replication-listener

The specific installation process can view their readme separately.

If the biz framework hangs, does it affect other systems?

This is a good question, and if the biz framework hangs up, we don't have to worry because the upstream and downstream adpipe in our architecture are queues, upstream is the Binlog queue, and the files exist on the MySQL server:

FileName Position
mysql-bin.000001
mysql-bin.000002
...
422655739
124544114
...

And our downstream is a message queue such as AMQ, BS through AMQ to get the message. Upstream and downstream are decoupled through queues, so the biz framework is unfortunately hung and does not affect other systems.

If the biz framework hangs up, can you quickly resume running?

Of course, the Biz Framework saves the filename and position of the event after each event, so if the service hangs, you can set the filename and position to appear after the MySQL server is connected, such as: Binlog_ Set_position ($link, $filename, $position).

Downstream (BS) if you hang up, can you recover quickly?

This is a very critical issue, and if BS hangs it needs to rebuild the index. As the system runs, incremental data becomes more and more, and if you give BS a copy of the most original benchmark base file, then BS consumes all the incremental messages from the system to date, which can take months or even a year. If we have an up-to-date full-scale benchmark file, the situation is different, so adpipe another feature is to periodically generate the latest full-scale benchmark files for downstream recovery data, so that downstream only a limited number of incremental messages can keep up with the DB data.

So this further leads us to think about this matter, can we design a generic event triggering system, here we only consider file (inotify)/mysql (binlog)/mongodb (Oplog), other similar to Redis also has keyspace Notifications function.

[Turn]mysql Binlog in realtime

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.