Atlas is a data middle-tier project based on the MySQL protocol developed by the Qihoo 360 company web Platform Department infrastructure team. It is based on the official MySQL version of Mysql-proxy 0.8.2, modified a number of bugs, adding a lot of functionality features. At present, the project within 360 companies have been widely used, many MySQL business has been connected to the Atlas platform, the daily load of read and write requests up to billions of. At the same time, more than 50 companies have deployed Atlas in their production environment, and more than 800 people have joined our developer Communications Group, and these numbers are growing.
Main function:
1. Read and write separation
2. Load balancing from the library
3.IP Filtration
4. Auto-table
5.DBA smooth up and down line db
6. Automatic removal of the db of downtime
Introduction to Atlas Sharding
Atlas Sharding is one of the most recently developed features of Atlas, which increases the horizontal scalability and capacity of MySQL to meet the needs of most businesses. The Sharding branch has been published on GitHub.
The basic idea of Sharding is to cut the data in a data table into several parts, and store it on different hosts (there are many strategies for segmentation), thus alleviating the performance and capacity of a single machine. Sharding is a horizontal segmentation that is suitable for situations where single table data is large. Currently, Atlas supports the static sharding scheme and does not support the automatic migration of data at the moment.
Atlas is sharding in a table, and the same database can have both sharding and sharding tables, and sharding table data is not in a sharding database group.
Currently, the Atlas Sharding supports inserts, deletes, select, UPDATE statements, and supports transactions that do not span shard.
Of course, because of the limitations of MySQL's distribution, the Atlas Sharding has limited support for SQL, but it is sufficient to meet the day-to-day requirements.
The difference from MySQL replication
MySQL master-slave replication is the real-time copy of the data from a MySQL instance (Master) to another MySQL instance (slave), which is an asynchronous replication process.
Data replication has some of the following features:
Data distribution
Load balancing (via Atlas or other proxy middleware)
Backup
High-availability (high availability) and fault tolerance
The limitation of replication is obvious, when the database writes frequently, but the reading operation is few in the scene, the duplication is not suitable, when writes too frequently, is difficult to be supported by a host, we still will face to the expansion bottleneck. In other words, replication can only extend read performance, but it is powerless to extend write performance.
Data segmentation (sharding): Through a certain set of conditions, we are stored in the same database in the data distributed to a number of databases (host), to achieve the spread of a single device load effect. This way, when written, Io is shared by the various shard. At the same time, in each of the Shard can also have replication exists, with the aid of Atlas can be read on the Shard separation, so replication and sharding is completely complementary to each other, not exclusive.
Sharding Architecture
Atlas is stateless, and for multiple groups on the back end, you can configure any number of Atlas instances, similar to the MongoDB MONGOs.
Sharding Database Group
In Atlas, a group is considered a unit of data storage, consisting of a single master, 0 or more slave (MySQL master-slave synchronization needs to be configured by the user). The data between each group is independent and has no relation, and parts of the table's data are stored in each group.
Separation of Read and write in group
The Atlas Sharding also supports read-write separation within the group, meaning that Atlas performs a read-write separation (read to Slave, write to master) of master and slave within the group after it hits a group.
Sharding Data Segmentation Strategy
Shard Key
Each shard table has a Shard key, which can be either a primary key or a primary key, but this column must be an integer. Atlas uses this Shard key to determine which database group the record should be stored in.
The Atlas Shardingh now supports two types of data segmentation: range and hash.
Range method
As shown above, the Shard key range of 0-1000 data is stored in the DBGROUP0, the range of 1000-2000 of the data is stored in the DbGroup1, 2000-maxint data is stored in the DbGroup2. The size of these ranges does not need to be the same. For example, if ID is shard key, sql: "SELECT * FROM Test where id = 1500;", Atlas will send this statement to DbGroup1. The range for the temporary atlas is static and does not support dynamic increase scopes.
Advantages:
For a range of SQL queries such as (where ID > or ID < 1000), the range sharding can accurately hit the backend data group without requiring SQL to be sent to each MySQL to request data, saving network transmission consumption.
Disadvantages
If the Shard key is incremented, it is possible that all SQL will hit the same data group for a period of time, without showing the sharding advantage, range does not apply to this scenario.
Applicable scenarios
Range applies to scenarios where there is a large demand for range queries, and Shard key is relatively discrete inserts
Hash mode
Currently, Atlas uses modulo to implement hash, that is, the hash (id) = id% Dbgroup_count, such as ID = ten, id% 3 = 1, so hit the DbGroup1.
Disadvantages
The hash and range method is just the opposite, the hash can be used to increase the situation of data, even in the case of incremental, sharding data is evenly distributed in the various data sets, but the disadvantage is that for the scope of the query usually need to query all the Dbgroup, Network consumption is relatively large.
Applicable scenarios
Hash applies to Shard key order growth, and the need for range queries is relatively small scenarios
About supported statements
The Atlas sharding only provides limited support for SQL statements, currently supports basic Select, Insert/replace, delete, update statements, and supports all where syntax, but for the following statements, If the statement hits multiple dbgroup, Atlas does not support it (if the statement hits only one dbgroup, such as SELECT COUNT (*) from test where ID < 1000, where the Dbgroup0 range is 0-1000, then these Attributes are supported)
Limit Offset (Support Limit)
ORDER BY
Group by
Join
On
Count, Max, min, etc functions
These statement atlas returns the error 1105 (HY000): Proxy warning-sharing Hit Multi dbgroup not Support SQL. Do not use these features on sharding tables, please do not let this table sharding if there is a demand for this feature.
Attention:
Subqueries may return incorrect results in sharding, and do not use subqueries. Please split the statement into multiple sentences execution
For write operations, if the write operation hits more than one database group, a write operation that hits multiple data sets is temporarily unsupported because of a partial success (failure of a group) that needs to be rolled back. Please split into multiple SQL statement execution.
Atlas may support Some of these features in the next release.
About Transaction support
Transactions are fully supported on sharding tables in Atlas, but for sharding tables, Atlas can provide only partial support (does not support transactions across Dbgroup). Atlas only supports statements involving a single dbgroup in a transaction, such as two dbgroup0, Dbgroup1, which is split by range, and the rule is dbgroup0:0-999, dbgroup1:1000-2000,
Mysql> begin; Query OK, 0 rows Affected (0.00 sec) mysql> INSERT into sharding_test (ID, name, age) VALUES (1, ' Test ', 0); Query OK, 1 row Affected (0.00 sec) mysql> INSERT into sharding_test (ID, name, age) VALUES (1500, ' test ', 0); ERROR 1179 (sqlst): Proxy warning-sharding Dbgroup is in trans, transaction won't not work across multi Dbgroup /*master*/select * from Sharding_test where ID < 1000; +----+------+------+----------+----------+ | ID | name | Age | Birthday | Nickname | +----+------+------+----------+----------+ | 1 | Test | 0 | NULL | NULL | +----+------+------+----------+----------+ 1 row in Set (0.00 sec) mysql>/*master*/select * from Sharding_test; ERROR 1179 (sqlst): Proxy warning-sharding Dbgroup is in trans, transaction won't not work across multi Dbgroup Commit Query OK, 0 rows Affected (0.00 sec) mysql>/*master*/select * from Sharding_test; +----+------+------+----------+----------+ | ID | name | Age | Birthday | Nickname| +----+------+------+----------+----------+ | 1 | Test | 0 | NULL | NULL | +----+------+------+----------+----------+ 1 row in Set (0.00 sec)
Note that the second statement, since the insert was previously bound to Dbgroup0, will then only accept statements involving DBGROUP0 in this transaction, and other statements would fail. "/*master*/select * from Sharding_test;" failed because the statement hit all the Dbgroup, and the same is not supported. After a commit, the sharding Dbgroup is no longer in the transaction state and can perform operations across Shard.
In other words, if the hash is sharding the table, basically the transaction is not supported, because the hash table, most of the operations will involve multiple dbgroup.
Add nodes
Note: Only a range-mode node extension is supported for the time being, and the hash method is not supported because of the need for data migration.
Extension node in the case of the original node to ensure that the range does not change, such as the existing dbgroup0 as a range of 0-999, Dbgroup1 for range 1000-1999, this time can increase the range of >2000 nodes. If you add one node to 2000-2999, modify the configuration file and restart Atlas.