Atlas distributed blockbuster hits

Source: Internet
Author: User
Tags commit extend hash mysql version range split

Atlas is a data middle-tier project based on the MySQL protocol developed by the Qihoo 360 company web Platform Department infrastructure team. It is based on the official MySQL version of Mysql-proxy 0.8.2, modified a number of bugs, adding a lot of functionality features. At present, the project within 360 companies have been widely used, many MySQL business has been connected to the Atlas platform, the daily load of read and write requests up to billions of. At the same time, more than 50 companies have deployed Atlas in their production environment, and more than 800 people have joined our developer Communications Group, and these numbers are growing.

Main function:

1. Read and write separation

2. Load balancing from the library

3.IP Filtration

4. Auto-table

5.DBA smooth up and down line db

6. Automatic removal of the db of downtime

Introduction to Atlas Sharding

Atlas Sharding is one of the most recently developed features of Atlas, which increases the horizontal scalability and capacity of MySQL to meet the needs of most businesses. The Sharding branch has been published on GitHub.

The basic idea of Sharding is to cut the data in a data table into several parts, and store it on different hosts (there are many strategies for segmentation), thus alleviating the performance and capacity of a single machine. Sharding is a horizontal segmentation that is suitable for situations where single table data is large. Currently, Atlas supports the static sharding scheme and does not support the automatic migration of data at the moment.

Atlas is sharding in a table, and the same database can have both sharding and sharding tables, and sharding table data is not in a sharding database group.

Currently, the Atlas Sharding supports inserts, deletes, select, UPDATE statements, and supports transactions that do not span shard.

Of course, because of the limitations of MySQL's distribution, the Atlas Sharding has limited support for SQL, but it is sufficient to meet the day-to-day requirements.

The difference from MySQL replication

MySQL master-slave replication is the real-time copy of the data from a MySQL instance (Master) to another MySQL instance (slave), which is an asynchronous replication process.

Data replication has some of the following features:

Data distribution

Load balancing (via Atlas or other proxy middleware)

Backup

High-availability (high availability) and fault tolerance

The limitation of replication is obvious, when the database writes frequently, but the reading operation is few in the scene, the duplication is not suitable, when writes too frequently, is difficult to be supported by a host, we still will face to the expansion bottleneck. In other words, replication can only extend read performance, but it is powerless to extend write performance.

Data segmentation (sharding): Through a certain set of conditions, we are stored in the same database in the data distributed to a number of databases (host), to achieve the spread of a single device load effect. This way, when written, Io is shared by the various shard. At the same time, in each of the Shard can also have replication exists, with the aid of Atlas can be read on the Shard separation, so replication and sharding is completely complementary to each other, not exclusive.

Sharding Architecture

Atlas is stateless, and for multiple groups on the back end, you can configure any number of Atlas instances, similar to the MongoDB MONGOs.

Sharding Database Group

In Atlas, a group is considered a unit of data storage, consisting of a single master, 0 or more slave (MySQL master-slave synchronization needs to be configured by the user). The data between each group is independent and has no relation, and parts of the table's data are stored in each group.

Separation of Read and write in group

The Atlas Sharding also supports read-write separation within the group, meaning that Atlas performs a read-write separation (read to Slave, write to master) of master and slave within the group after it hits a group.

Sharding Data Segmentation Strategy

Shard Key

Each shard table has a Shard key, which can be either a primary key or a primary key, but this column must be an integer. Atlas uses this Shard key to determine which database group the record should be stored in.

The Atlas Shardingh now supports two types of data segmentation: range and hash.

Range method

As shown above, the Shard key range of 0-1000 data is stored in the DBGROUP0, the range of 1000-2000 of the data is stored in the DbGroup1, 2000-maxint data is stored in the DbGroup2. The size of these ranges does not need to be the same. For example, if ID is shard key, sql: "SELECT * FROM Test where id = 1500;", Atlas will send this statement to DbGroup1. The range for the temporary atlas is static and does not support dynamic increase scopes.

Advantages:

For a range of SQL queries such as (where ID > or ID < 1000), the range sharding can accurately hit the backend data group without requiring SQL to be sent to each MySQL to request data, saving network transmission consumption.

Disadvantages

If the Shard key is incremented, it is possible that all SQL will hit the same data group for a period of time, without showing the sharding advantage, range does not apply to this scenario.

Applicable scenarios

Range applies to scenarios where there is a large demand for range queries, and Shard key is relatively discrete inserts

Hash mode

Currently, Atlas uses modulo to implement hash, that is, the hash (id) = id% Dbgroup_count, such as ID = ten, id% 3 = 1, so hit the DbGroup1.

Disadvantages

The hash and range method is just the opposite, the hash can be used to increase the situation of data, even in the case of incremental, sharding data is evenly distributed in the various data sets, but the disadvantage is that for the scope of the query usually need to query all the Dbgroup, Network consumption is relatively large.

Applicable scenarios

Hash applies to Shard key order growth, and the need for range queries is relatively small scenarios

About supported statements

The Atlas sharding only provides limited support for SQL statements, currently supports basic Select, Insert/replace, delete, update statements, and supports all where syntax, but for the following statements, If the statement hits multiple dbgroup, Atlas does not support it (if the statement hits only one dbgroup, such as SELECT COUNT (*) from test where ID < 1000, where the Dbgroup0 range is 0-1000, then these Attributes are supported)

Limit Offset (Support Limit)

ORDER BY

Group by

Join

On

Count, Max, min, etc functions

These statement atlas returns the error 1105 (HY000): Proxy warning-sharing Hit Multi dbgroup not Support SQL. Do not use these features on sharding tables, please do not let this table sharding if there is a demand for this feature.

Attention:

Subqueries may return incorrect results in sharding, and do not use subqueries. Please split the statement into multiple sentences execution

For write operations, if the write operation hits more than one database group, a write operation that hits multiple data sets is temporarily unsupported because of a partial success (failure of a group) that needs to be rolled back. Please split into multiple SQL statement execution.

Atlas may support Some of these features in the next release.

About Transaction support

Transactions are fully supported on sharding tables in Atlas, but for sharding tables, Atlas can provide only partial support (does not support transactions across Dbgroup). Atlas only supports statements involving a single dbgroup in a transaction, such as two dbgroup0, Dbgroup1, which is split by range, and the rule is dbgroup0:0-999, dbgroup1:1000-2000,

Mysql> begin; Query OK, 0 rows Affected (0.00 sec) mysql> INSERT into sharding_test (ID, name, age) VALUES (1, ' Test ', 0); Query OK, 1 row Affected (0.00 sec) mysql> INSERT into sharding_test (ID, name, age) VALUES (1500, ' test ', 0); ERROR 1179 (sqlst): Proxy warning-sharding Dbgroup is in trans, transaction won't not work across multi Dbgroup /*master*/select * from Sharding_test where ID < 1000; +----+------+------+----------+----------+ | ID | name | Age | Birthday | Nickname | +----+------+------+----------+----------+ | 1 | Test | 0 | NULL | NULL | +----+------+------+----------+----------+ 1 row in Set (0.00 sec) mysql>/*master*/select * from Sharding_test; ERROR 1179 (sqlst): Proxy warning-sharding Dbgroup is in trans, transaction won't not work across multi Dbgroup Commit Query OK, 0 rows Affected (0.00 sec) mysql>/*master*/select * from Sharding_test; +----+------+------+----------+----------+ | ID | name | Age | Birthday | Nickname| +----+------+------+----------+----------+ | 1 | Test | 0 | NULL | NULL | +----+------+------+----------+----------+ 1 row in Set (0.00 sec)

Note that the second statement, since the insert was previously bound to Dbgroup0, will then only accept statements involving DBGROUP0 in this transaction, and other statements would fail. "/*master*/select * from Sharding_test;" failed because the statement hit all the Dbgroup, and the same is not supported. After a commit, the sharding Dbgroup is no longer in the transaction state and can perform operations across Shard.

In other words, if the hash is sharding the table, basically the transaction is not supported, because the hash table, most of the operations will involve multiple dbgroup.

Add nodes

Note: Only a range-mode node extension is supported for the time being, and the hash method is not supported because of the need for data migration.

Extension node in the case of the original node to ensure that the range does not change, such as the existing dbgroup0 as a range of 0-999, Dbgroup1 for range 1000-1999, this time can increase the range of >2000 nodes. If you add one node to 2000-2999, modify the configuration file and restart Atlas.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.