A very important consideration in choosing an open source core component is often community activism, especially if the project team is unable to perform its own follow-up maintenance and expansion.
On the Sub-database table and read/write separation, master-slave
Generally speaking, the system needing the sub-database is a large amount of traffic, and more prone to peak such as discount/activity, and secondly, when the single machine can not carry traffic, the sub-database is not the first choice, in the sub-database before the table, you should ensure that the vertical split complete, the subsystem is high cohesion, Second, based on the master-slave of read-write separation or fuzzy query a lot of, perhaps nosql such as elastic to drain a large part. When the read and write separation is done, the main library is left with the key business logic, the traffic is still very high, this time began to consider the sub-database table. Because the sub-database has much more requirements for development and operations than read-write separation and vertical splitting, blind introduction can only lead to high costs (especially with various SQL restrictions) if the business is determined not to soar within a two-year period.
Second, the Sub-database table will increase the number of databases n times, generally is a multiple of 4, if an application said to do a sub-database table, and only two machines, that is the fun.
Read-Write separation and sub-database sub-table should be two things more reasonable before and after, many people will these two things mixed together to tell the exact unreasonable. The sub-database is usually more of a purely OLTP thing, allowing the system to scale horizontally. and read-write separation more is to put a part can tolerate short delay/not guaranteed 100% (but should be more than 99%) accurate query routing to the query library, is accurate to the business according to priority classification, these queries from the point of view of resource consumption of the relative logic of the PK query is several times to hundreds of times times higher, For example, to inquire about a person's transactions over the past 3 months, but he is not a DSS concept from a business perspective, such as querying an order that has been t-n/exporting for those orders, and may do some work on the list or even modify the state manually. By splitting these functions from the core order/transaction/capital OLTP, you can ensure that the core business system does not incur additional jitter factors outside the system due to an abnormal operation such as SQL unreasonable.
From the system design point of view, the read and write separation is the same as the logical table structure, all have the same field definition, but the physical implementation is necessarily not the same (if it is identical, the explanation does not understand the original intention of the read and write separation), especially on the index, the writing library may be in addition to PK, A unique index (any table should have a unique index, a distributed lock can only be used as a buffer), and up to one or two simple field indexes to maximize performance, any SQL that meets the criteria does not generally exceed dozens of (except for extreme customers), but the read library differs from business to There may be many indexes to meet the needs of a variety of queries and simple statistical summary reports.
Is it necessarily more important to write a library than to read a library? Yes, it's not. is absolute, not relative. Because the read library is not a DSS library, it is a relatively non-critical business function in the trading library. Therefore, once the library is hung up, it will lead to the business does not go, read the library hangs, may lead to make business decisions wrong. For example, did not find out to do a trade, and re-trade again.
A very important decision to consider a sub-library table is whether to allow cross-library operations and if it is necessary. Todo.... I want to add ...
Second, the Sub-Library and the sub-table is two things, whether to choose a sub-library or a table, if the volume is not so large and virtual machine, the estimated sub-library is enough.
Version of SHARDING-JDBC and its schema differences
Currently the latest version of the SHARDING-JDBC is V3.0.0.M1, should be said to be unstable, the package name has changed to Io.shardingsphere,jar package name is SHARDING-JDBC.
1.5.4.1 is currently the latest version, may also be the final version, 1.x, coordinates are COM.DANGDANG,JAR package name is SHARDING-JDBC.
2.0.3 is currently the latest version, package name and coordinate changed to Io.shardingjdbc,jar package name is Sharding-jdbc-core.
Show that the plan is not particularly good, or some chaos.
Because after 2.0 is basically pure sub-Library sub-list of the core characteristics of the enhancement is not much, mainly to the direction of governance and agent, so if there is no special needs such as the need for a mycat-like independent service proxy mode, using 1.x (note: 1.x version of the official website document appears offline), but if it is a large-scale deployment, but The registry in the MicroServices architecture, or spring boot based, can consider using 2.0 because 2.0 adds registry-based configuration management and Spring boot starter. So the architecture for 2.0 is this:
After 3.0, a sharding-proxy stateless server like the Mycat role is added (the proxy layer can have, but should not be accessed directly as an application, as shown), and a database mesh similar to service mesh. However, there is no change in the core, and some new support is added to SQL syntax. So the 3.0 architecture is as follows:
As far as the core of the sub-database is concerned, we can only care about SHARDING-JDBC, and we do not need to care about the other parts of Sharding-sphere.
The comparison of sharding-jdbc/sharding-proxy/Sharding-sidecar is as follows:
Transaction
For any important system, the consistency of things is critical. This is especially true for distributed systems, and most importantly transactional consistency, where the sub-table itself does not introduce another level of complexity. Because it has a layer in the JDBC driver layer, it is necessary to understand its support for transactions and the associated limitations. Transaction
sharding JDBC 2.x does not support strong consistency of distributed transactions, the current system design is not the pursuit of strong consistency, but ultimately consistency. So the sharding JDBC 2.x supports 2 of transactions: Weak xa (same as mycat) and maximum effort delivery (official short bed) (a kind of base), which needs to be weighed against business and development, if the architecture specification and design are done well, is the ability to do non-cross-library distributed transactions .
Weak XA transactions are the default pattern (that is, there is no guarantee of consistency when there are no exceptions thrown during a DML, a machine is disconnected or down during commit), and there is no special requirement.
The bed is added with a short tolerance, the executed statement is centrally stored, and then polling for a failed transaction retry during commit, so the bed schema is as follows:
But without a free lunch, the bed has some extra requirements for development and maintenance, and these requirements are wide-ranging and definitely hurt –. The development dimension includes:
- The primary key of the INSERT statement cannot be self-increment
- Update must be repeatable, such as update xxx SET x=x+1 is not supported, and for updating the balance class, this is equivalent to requiring optimistic locking
Operational dimensions include:
- Database where transaction logs need to be stored
- Zookeeper for use with asynchronous jobs
- Unzip sharding-jdbc-transaction-async-job-$VERSION. Tar, starting an asynchronous job by start.sh script
We chose a distributed transaction that avoids strong consistency from the design level.
Shard Flexibility
One important feature for a partitioned table is the flexibility of sharding, such as a single field, multiple fields =, in, >=, <=. Why multiple fields are important, here comes a special consideration
SHARDING-JDBC currently offers 4 kinds of sharding algorithms.
Because the sharding algorithm is closely related to the business implementation, the built-in sharding algorithm is not provided, but the various scenarios are extracted through the sharding strategy, providing a higher level of abstraction and providing an interface to enable the application developer to implement the sharding algorithm on its own.
Corresponds to Preciseshardingalgorithm, which is used to process a scene that uses a single key as the Shard key = with in. Need to be used in conjunction with Standardshardingstrategy.
Corresponds to Rangeshardingalgorithm, which is used to process a scene that uses a single key as the Shard key for between and shards. Need to be used in conjunction with Standardshardingstrategy.
- Composite Shard algorithm
Corresponding to the complexkeysshardingalgorithm, used to handle the scene using multi-keys as sharding key for sharding, multi-shard key logic is more complex, it is necessary for application developers to handle the complexity of it. Need to be used in conjunction with Complexshardingstrategy.
- The Hint shard algorithm (hint Shard refers to a scenario where a non-SQL decision is made for a shard field that is determined by other external conditions, using SQL hint to inject a flexible shard field. Example: internal system, which is divided by employee login ID, and there is no such field in the database. SQL hint support is used in both Java APIs and SQL annotations (to be implemented). )
Corresponds to Hintshardingalgorithm, which is used to process scenes using hint row shards. Need to be used in conjunction with Hintshardingstrategy.
Because of the flexibility of the algorithm, the standard way is through the implementation of specific Java interface is the implementation of specific sharding algorithm such as Singlekeydatabaseshardingalgorithm, there are many cases, the Shard is relatively simple, such as purely customer number, In this case, an inline expression shard policy is provided, using Groovy's expression to provide support for Shard operations in the SQL statement for = and in, although this only supports single-shard keys. For example, thet_user_${u_id % 8}t_user table is divided into 8 tables according to U_ID by 8, and the table name ist_user_0tot_user_7.
Sharding key + Shard algorithm = a truly available shard policy.
The selection of the algorithm and the partition key is the key of the sub-database, which directly determines whether the load of each library is balanced and whether the expansion is easy. In the Design Considerations section, the author will elaborate on the order and entrusted business, the user in the use of sub-database table design considerations and reasons.
Syntax restrictions
For the Sub-database table, you also need to know what the limitations of SQL, especially the need to deal with two times, such as sorting, de-emphasis, aggregation and so on.
Here the author lists those that are commonly used but not supported. Like what:
But fortunately, these are more convenient to handle in Java/js.
If there is a very complex SQL, the biggest possibility is that there is a design problem, you should use a read-write separation solution.
SHARDING-JDBC the limitations of SQL can be referenced in full http://shardingsphere.io/document/current/cn/features/sharding/usage-standard/sql/
Design Considerations
Which tables to sub-list
First from the design to distinguish clearly which is the broadcast table/which is the library table/which is only in one library of the table, because it is a public data source, so whether it is not a library of tables, need to configure, do not configure the Shard rule sharding-jdb that can not accurately determine which data source should be routed to. However, the General Library sub-table component, including SHARDING-JDBC, provides a simplified configuration method. For non-fragmented tables:
The 1:sharding-jdbc method can be configured default-data-source-name in <sharding:sharding-rule/>, so that tables that do not have a shard rule configured are located through the default data source.
Method 2: Separate the data sources that do not participate in the sub-Library from the SHARDING-JDBC, and use multiple data sources in the application to handle shards and non-sharding, respectively.
Sub-Library or sub-table
Selection of sharding keys
The most important of these is that the sharding key cannot be a self-increment field, otherwise insert will not know where to go.
Distributed primary key
SHARDING-JDBC Instructions for use
For SQL with only one shard key using = and in, it is recommended to use a row expression instead of the Java class configuration. Let's say we don't use weak transactions (if you use flexible transactions, you also need to introduce sharding-jdbc-transaction and Sharding-jdbc-transaction-async-job), This can be done simply by introducing the Sharding-jdbc-core jar package (since SHARDING-JDBC's configuration supports Java, YAML, Spring Boot, and spring namespaces (like Dubbo), So it is recommended to use the Spring namespace method) as follows:
<dependency>
<groupId>io.shardingjdbc</groupId>
<artifactId>sharding-jdbc-core</artifactId>
<version>2.0.3</version>
</dependency>
<!-- for sharding-jdbc spring namespace -->
<dependency>
<groupId>io.shardingjdbc</groupId>
<artifactId>sharding-jdbc-core-spring-namespace</artifactId>
<version>2.0.3</version>
</dependency>
Because sharding JDBC can support DBCP, Druid, there is no need to change other dependencies.
Then configure the data source, the data source Shard policy, and the table sharding policy.
Note the point:
1, after using SHARDING-JDBC, select from TableName will be converted to select from TableName, that is, to lowercase, which is handled in the code, if you do not want to convert (SQL standard is case insensitive, Mainly MySQL Linux under the lower-Case-table-names=1 This special case, and MySQL 8.0 does not allow database initialization and startup values inconsistent, 5.7 is possible, https:// bugs.mysql.com/bug.php?id=90695), you need to compile the sharding-jdbc yourself and change the lowercase code in Gettabletokens (this should be a bug), The 2.0.3 version code location is:
/ / IO. Shardingjdbc. Core. Rewrite. SQLRewriteEngine. GetTableTokens (TableUnit) in the name of the table in the SQL statement made toLowerCase (), as follows:
Private Map<string, string=""> getTableTokens(final TableUnit TableUnit) {</string,>
String logicTableName = tableUnit. GetLogicTableName (.) toLowerCase ();
Map<string, string=""> tableTokens = new HashMap<>();</string,>
TableTokens. Put (logicTableName, tableUnit getActualTableName ());
Optional < BindingTableRule > BindingTableRule = shardingRule. FindBindingTableRule (logicTableName);
If (bindingTableRule isPresent ()) {
TableTokens. PutAll (getBindingTableTokens (tableUnit, bindingTableRule. The get ()));
}
Return tableTokens;
}
2. Be sure to set the Ignore-unresolvable property of Context:property-placeholder to True, that is, <context:property-placeholder location="classpath:property/*.properties" ignore-unresolvable="true" />, otherwise the placeholder cannot be parsed.
Reference:
https://www.oschina.net/question/2918182_2280300
Https://www.oschina.net/news/88860/sharding-jdbc-1-5-4-released
Http://www.infoq.com/cn/news/2017/12/Sharding-JDBC-2
http://cmsblogs.com/?p=2542
http://shardingjdbc.io/document/legacy/2.x/en/00-overview/
https://www.oschina.net/question/2356021_2264290
80613043
Sharding-jdbc use of MySQL sub-Library sub-table scheme (not demo sample)