First written in front, MyCAT1.4 's alpha version has been released, which fixes a lot of bugs, but also perfected a detail, the previous two blog has made some changes
---------------------------------------------------------------------------------this is the body ~------------------------ ----------------------------------------------------------
The role of the schema has been introduced before, and this article will introduce rule and server together ~
First of all, rule, in this file will be detailed in the development of a variety of shard rules, this time only some of the higher utilization rate, first on the contents of the configuration file
Can be a simple look, in the upper part of the definition of rule, in the lower part, is the rule corresponding to the actual segmentation rules, here the chief engineer to introduce the following four kinds of segmentation method ~murmur has pit ~
-------------------------------------------------------------------------------------------Hash-int---------------------- -----------------------------------------------------------
First look at Hash-int, in this section of the rule below, there is a mapfile, which means that this segmentation rule is based on partition-hash-int content to determine, then look at this text file
Very simple content, which represents the base column used by The Shard, with a value of 10000, placed in the first DN (DN1), with a value of 10010, placed in the second DN (DN2)
Can look at the actual effect
Look at Mycat debug Log, these two statements are assigned to DN1 and DN2 above, the database is also inserted in the corresponding data
So ~ the problem (excavator Roll coarse ~), if the inserted data, the value of the Datum column is not the value stated in this file, what will be the effect?
Straight out of the error ~
Well, the hash-int of this kind of segmentation rules, generally understood as the enumeration partition, will be more suitable for fixed-value occasions, such as gender (0,1), provinces (fixed value, short time will not regain the Japanese province bar ~), channel quotient or various platform ID
Moreover, separated by commas can put multiple values in a partition, so can be based on the actual amount of data/traffic/access to the comprehensive development of the segmentation strategy;
Cons: Not all-powerful warrior ╮ (╯_╰) ╭
-------------------------------------------------------------------------------------------Range-long-------------------- -------------------------------------------------------------
The second way of segmentation, Range-long, carefully look at the words, and hash-int is more like, but also by a specific file to determine the segmentation strategy, so or to look at the contents of the file
From the content of the file can be seen, this is a range of segmentation, set the value range of the Datum column, and then put all the data in this range on a DN, this way and hash-int basically consistent, not (lazy cancer late, time is not enough!) )
This segmentation strategy, personal feeling in the business database in the use of the scene will be less, because this kind of segmentation needs to be predetermined the overall number, which determines that the unlimited growth of data can not use this, after all, to change the segmentation strategy will be very troublesome
Really want to use up, feel also on the self-increment primary key use, and then according to a certain amount of uniform segmentation, such as that one day fixed x data business (temperature collection? Data acquisition? And then build multiple DN (libraries) in advance.
Of course, there is also a potential problem, if a large number of sequential inserts in a short time, and each DN (sub-Library) set a higher quantity (for example, a DN set of 1000W data), then at this time, there will be a certain DN (sub-Library) IO pressure is very high, While several other DN (sub-Libraries) do not have IO operation at all, there will be similar to the common hot block/hot dish phenomenon in db, and MySQL often use the self-increment primary key, so that the MySQL table has a lot of "order" to insert a lot more opportunities.
--------------------------------------------------------------------------------------------Mod-long--------------------- --------------------------------------------------------------
Mod-long, from the MoD, this should be a way to take the remainder, to look at the specific configuration of the information
Count=4, which represents the total number of data cut into four parts, generally corresponds to the specific DN, so that the data evenly distributed on four DN (of course, COUNT<DN number is not a problem)
Look at the actual effect
Take a look at the debug log of Mycat and see how mycat is handled
With this method of taking the remainder, these four data are inserted into four DN (library) respectively, and it can be seen that the data is evenly dispersed over multiple DN (libraries) when sequential insertion.
Compared to the above range method, this segmentation strategy will better distribute the pressure of the database write, but the problem is also very obvious, once the scope of the query, you need to mycat to merge the results, when the data volume is high, this cross-Library query + merge results consumption of time may increase a lot, In particular, the time of order by is also present.
So this segmentation strategy will be more suitable for single-point query scenarios, such as ..... I don't know...... Really do not know, perhaps in the bank, when querying personal account information, some and user information tables can be redundant, and then use this way to provide more efficient query (after all, the number of bank users, nn ~)
--------------------------------------------------------------------------------Partition-by-long------------------------ ----------------------------------------------------------
Partition-by-long, in the Range-long and Mod-long between a slightly eclectic division strategy, the specific segmentation situation according to the following description:
1024 is a unit, each DN holds partitionlength amount of data, and Partitioncount x partitionlength=1024
Looks a bit difficult to understand, the image point description, with Partitioncount (4) x partitionlength (256) For example, sid%1024=0-255 placed in the dn1,256-511 of the DN2, and so on
Try to insert eight data with a 128 offset, and look directly at the Mycat log
As you can see, eight data is evenly distributed within the four DN ~
It is worth mentioning that this segmentation strategy also supports non-uniform distribution ~ It is not measured, stolen figure two Zhang ~
These two graphs basically also understand this non-uniform distribution of the Division strategy, the focus is still above the 2x256+1x512=1024 ~
This partitioning strategy between Range-long and Mod-long to take a compromise point, at the same time, but also is relatively flexible, can be based on different situations of non-uniform division, can actually be applied to a slightly more scenes, or, a lot of scenes can be used, relatively reduce the cross-dn situation, And the data is more evenly cut apart, a single point of query is not too slow.
-----------------------------------------------------------------------------------written in the last---------------------------------- ---------------------------------------------------
In fact, there are many ways to mycat support, for example, according to the time of the segmentation strategy, can be month, by day, and so on, there is no way to put all the strategy, forgive the O ( ̄ヘ ̄o#)
In fact, from a personal point of view, the segmentation of time according to the database itself partition strategy is no problem, half-yearly, quarterly data will still need to inquire .... PS: _ (: З"∠) _ Really not lazy ...
It can be said that the focus of the sub-list of the Mycat, the basic all in this rule embodies, the table to be divided, the table of data how to slice, are required according to the actual business to decide, fully according to the characteristics of the business to determine the most appropriate division strategy ~
The next chapter predicts >>server, the main part of Mycat tuning
First article http://blog.itpub.net/29510932/viewspace-1664499/
Second article http://blog.itpub.net/29510932/viewspace-1667814/
Turn from
MySQL distributed cluster Mycat (iii) rule Analysis-wangwenan6-itpub Blog
http://blog.itpub.net/29510932/viewspace-1678591/
MySQL distributed cluster Mycat (iii) Rule analysis "turn"