1, database overviewIn the internet age,
storage and access of massive databecomes the bottleneck problem of the system design and use, for the massive data processing, according to the use scene, mainly divides into two kinds
online transaction processing (oltp:on-line transaction processing): Also known as transaction-oriented processing systems, the basic feature is that raw data can be transmitted immediately to the computing center for processing, and in a very short period of time to give the results of processing functions:
Daily transaction ProcessingDB Design:
Real-Time transaction class applicationData processing: Current, latest details, two-dimensional discrete real-time:
High Real-time read and write requirementsTransaction:
Strong ConsistencyAnalysis Requirements: Low, simple
Online Analytical Processing (OLAP: On-line Analytical Processing): Refers to the multidimensional way of data analysis, query and report, you can work with data mining tools, statistical analysis tools with the use of enhanced decision-making analysis function:
statistics, analysis, reportingDB Design:
For statistical analysis class applicationsData processing:
The integrated, unified , and multidimensionalReal-time performance:
low Real-time read/write requirementsTransaction:
Weak transactionAnalysis requirements: High, complex for the above two types of systems have a variety of technical implementation scenarios, storage part of the database is divided into two major categories:
relational databaseAnd
NoSQL Databaserelational databaseis a database based on relational model, which is based on mathematical concepts and methods such as set algebra to deal with data representation of the mainstream Oracle, DB2, MS SQL Server and MySQL characteristic data relational model
relational models, structured storage, integrity constraintsBased on two-dimensional tables and their relationships, it is necessary
connection, and, intersection, difference, exceptSuch data operations using Structured Query Language (SQL) to do data read and write operations need data consistency, need a transaction or even strong consistency of the advantages of maintaining data consistency (transaction processing) can be a join, such as complex query generalization, technical maturity disadvantage data read and write must be parsed by SQL, a large number of data, Low read and write performance under high concurrency to read or write data, or to modify a data structure needs to be locked, the impact of concurrent operations can not adapt to the unstructured storage expansion difficult to expensive, complex
NoSQL Database, which is called not only SQL, means that the relational database is used when the relational database is applied, and it is not necessary to use a relational database when it is not applicable. Consider using a more appropriate data storage representative
TemporaryKey-Value Storage (memcached, Redis)
PermanentKey-Value Storage (ROMA, Redis)
Document orientedDatabase (MongoDB, CouchDB)
column-orientedDatabase (Cassandra, HBase) features unstructured storage based on multidimensional relationship model has a unique advantage of the use of scenarios and high concurrency, large data read and write ability of the basic support of distributed, easy to expand, scalable simple, weak structural storage shortcomings join and other complex operational capabilities weaker transaction support weak pass Poor application of complex business scenarios without complete constraints
2,mycat OverviewfunctionDBAMycat is the MySQL server, and mycat behind the MySQL server, like a MySQL storage engine, such as Innodb,myisam, so, Mycat itself does not store data, data is stored on the MySQL back-end, So data reliability and transactions are guaranteed by MySQL
EngineerMycat is a database server approximately equal to MySQL, you can connect MySQL with the way to connect Mycat (except the port is different, the default Mycat port is 8066 rather than MySQL 3306, so need to add port information on the connection string) You can use Mycat with your familiar object mapping framework, but it is recommended that you use the underlying SQL statements as much as possible for a partitioned table, because it achieves the best performance, especially if you have tens of millions of or even tens of billions of records
ArchitectMycat is a powerful
Database Middleware, not only to be used as
Read and write separationAnd
Sub-table library、
Disaster Recovery backup, and can be used to
Multi-tenant application development、
YunpingPlatform Infrastructure, so that your architecture has a strong
AdaptabilityAnd
Flexibility, with the help of the upcoming release of Mycat
Intelligent Optimization Module, the system's data access bottlenecks and hotspots at a glance, based on these statistical analysis data, you can automatically or manually adjust the back-end storage, mapping different tables to different storage engines, and the entire application of the code line does not change
principleOne of the most important verbs in the mycat principle is "
Intercept", it intercepts the user sent over the
SQL statement, the SQL statement is first done with some
Specific Analysis: As
pointsSlice Analysis、
Routing Analysis、
Read and write separation analysis、
Caching AnalysisAnd so on, and then send this SQL back to the real database, and the results returned to do the appropriate processing, and eventually returned to the user
Application ScenariosPure read and write separation, at this time the simplest configuration, support
Read and write separation,
master and slave switchingSub-table library, for more than 10 million of the table to slice, the largest support of 100 billion of the single sheet of multiple tenant applications, each application of a library, but the application is only connected to Mycat, so that the program itself, to achieve a multi-tenant reporting system, with the help of Mycat's ability to handle the statistics of large-scale reports Instead of hbase, the analysis of large data as a real-time query of massive data is a simple and effective solution, such as 10 billion frequent query records need to query in 3 seconds results, in addition to based on the primary key query, there may be scope query or other property query, at this time mycat may be the most simple and effective choice
3,mycat ConceptDatabase MiddlewareMycat is a database middleware, which is intermediate service between data processing and interaction between database and application. Since the previous fragment of the data is processed, from the original library, was cut into a number of fragmented databases, all of the fragmented database cluster composed of the entire database storage database middleware, the application needs to concentrate and business processing, a large number of common fragmented cluster, data source switching, transaction processing, Data aggregation is handled by the middleware
Logical LibrariesFor practical applications, there is no need to know the existence of middleware, business developers only need to know the concept of the database, so the database middleware can be considered as one or more database clusters of logical library
Logical TablesLogical TablesIn a distributed database, a table that reads and writes data is a logical table for applications. Logical table, can be data segmentation, distributed in one or more fragments library, also can not do data segmentation, not fragmented, only a table to constitute
Fragment TableRefers to the original very large data table, need to split into a number of database tables, so that each fragment has a part of the data, all the fragments constitute the complete data through the <table> datanode configuration of multiple slices of nodes
Non-fragment tableNot all of the tables in a database are large, some tables can not be split, the datanode is relative to the fragment table, and those who do not need to data segmentation of the table through the <table> configuration of a fragment node
er tableBased on the data partitioning strategy of E-R relationship, the record of the child table is stored on the same data fragment as the associated parent table record, that is, the child table is dependent on the parent table, and the table group is used to ensure that the data join is not grouped across the Library Action table (table Group) is a very good way to solve the data join, and it is also an important rule of data segmentation planning.
Global TableIn a real business system, there are often a lot of similar
Dictionary TableThe tables, these tables are basically rarely changed, the dictionary table has the following several characteristics change infrequent data volume overall change little data scale is small, there are few more than hundreds of thousands of records for this kind of table, in the case of fragmentation, when the business table due to the size of the partition, the business table and these affiliated dictionary table, the relationship between the becomes a tricky problem, so mycat through
Data RedundancyTo solve this kind of table join, that is, all of the fragments have a copy of the data, all the dictionary table or some tables that match the characteristics of the dictionary are defined as global table data redundancy is a good way to solve the cross slicing data join and another important rule of data splitting planning.
Fragment Nodefragment node (datanode)After data segmentation, a large table is divided into different fragmented databases, where each table fragment resides in a partitioned node (Datanode)
node Host (datahost)Data segmentation, each fragment node (Datanode) will not necessarily monopolize a machine, the same machine can have more than one fragmented database, so that one or more fragmented node (Datanode) is located in the machine is the node host (datahost), in order to circumvent the single node host concurrency limit, As far as possible the high read and write pressure of the piecewise node (datanode) evenly placed in the different node host (datahost)
Fragmentation rules (rule)In front of the data segmentation, a large table is divided into several pieces of table, you need certain rules, so according to a certain business rules of the data to a fragment of the rule is the fragmentation rules, data segmentation to select the appropriate fragmentation rules is very important, will greatly avoid the difficulty of subsequent data processing
Global serial Number (sequence)After the data segmentation, the PRIMARY KEY constraint in the original relational database will not be used under the distributed condition, so it is necessary to introduce the external mechanism to guarantee the data uniqueness identification, the mechanism of guaranteeing the global data unique identification is
Global serial Number (sequence)Multi-tenant: Multi-tenant technology or multiple leasing technology, is a software architecture technology, which is to explore and implement how to share the same system or program components in a multiuser environment, and still ensure the isolation of data between users. Independent database sharing database, isolation data schema sharing database, sharing data architecture
4,mycat Usemycat ConfigurationSchema.xmlThe logical libraries, tables, partitioning rules, Datanode, and datasource schema tags that govern mycat are used to define logical libraries in MYCAT instances Mycat can have multiple logical libraries, each with its own related configuration table label The table label defines the mycat logical table, and all tables that need to be split need to define the Datanode tag in this tab to define the Datanode that the logical table belongs to, and the value of the property needs to correspond to the value of the Name property in the Datanode tag datahost As the last label in the Schema.xml, the label exists as the lowest label in the Mycat logical library, directly defining the specific database instance, the read-write detach configuration, and the heartbeat statement.
Server.xmlServer.xml almost saves all mycat required system configuration information Optimization Configuration user label System label
Rule.xmlThe definition of the rules involved in splitting the table is defined in the Rule.xml. We can flexibly use different slicing algorithms for the table, or use the same algorithm for the table, but the specific parameters are different Tablerule label defines the table rule function label
Table Association issues (table join)Join OverviewInner join: Intersect LEFT JOIN RIGHT join full join: and set recommended to use Inner join
Global TableDictionary table changes infrequently the amount of data is not large, there are few more than hundreds of thousands of records the insert and update operations of global table attributes global tables are executed in real time on all nodes, maintaining the data consistency of each fragment the query operation of the global table is obtained from only one node A global table can join with any table
ER JoinThe data partitioning strategy based on E-R relationship, the records of the child table and the associated parent table records are stored in the
Same Data fragmentOn
Share JoinSharejoin is a simple cross fragment join, based on the HBT way to implement the current support of 2 tables, the principle is to parse the SQL statements, split into a single table of SQL statement execution, and then the data collection of each node
Catlet (Artificial intelligence)Solving the problem of a partitioned SQL join is far more complex than it might be, and often fails to achieve efficient processing, which, in this case, relies on artificial intelligence to programmatically solve a particular number of join logic in a business system that must span a fragment of SQL, MYCAT provides a specific API for the programmer to invoke, This is the Mycat innovative idea--artificial intelligence
Mycat Fragmentation RulesOverviewIn the process of data segmentation, especially in horizontal segmentation, the two processes that the middleware ultimately wants are
Segmentation of Data、
Aggregation of data。 Choosing the right segmentation rule is critical because it determines the ease of subsequent data aggregation, and even avoids the important principles of data aggregation across libraries, including several
Data Redundancy,
Tables Grouping (table group), this is a good way for businesses to circumvent Cross Library joins, but not all business scenarios are appropriate for such rules, so this chapter will explain how to choose the right segmentation rules
several slicesmycat Global Tableer Fragment tableMulti-Many-to-many AssociationThe general principle is to see which table the relational table favors from a business perspective
PRIMARY Key FragmentationVs
non-primary key fragmentationWhen you don't have any fields that you can use as a fragment field, primary key fragmentation is the only choice, its advantage is based on the primary key query fastest, when the use of automatic growth of the serial number as the primary key, but also more uniform data fragmentation on different nodes if there is a suitable business field is more appropriate as a fragment field, This business field fragment is recommended, and the conditions for selecting a fragment field are as comparable as possible
Evenly distributedData to each node the business field is
most frequent, or
most importantThe query criteria
Mycat Common Partitioning rulesFragment EnumerationFor example, some businesses need to follow
ProvincesOr district
CountyTo do the preservation while the national provinces and counties
fixedOf
fixed piecewise hash algorithmThis rule is similar to the decimal
Model Findingoperation, the difference in so
Binary SystemThe operation is to take the ID of the binary low 10 bits, that is, id binary &1111111111. The advantage of this algorithm is that if you follow the 10 modulo operation, 1-10 will be divided into 1-10 slices in the continuous insertion 1-10, increasing the difficulty of the insert transaction control, and this algorithm according to the binary system may
divided into successive slices,
ReduceInsert
Transaction Control DifficultyModel FindingThis rule for the operation of the Fragment field is very clear, that is, based on the ID of the decimal model budget, compared to fixed fragment hash, this type of bulk insert may be inserted in a single transaction insert multiple data fragmentation, increase transaction consistency difficulty
Slice by date (day)This rule is
Slice by dayThis kind of rule is the combination of modulo operation and scope constraint, mainly for the subsequent data migration, that is, we can decide the node distribution of the data after taking the model independently.
take exemplary confining constraintsThis kind of rule is the combination of modulo operation and scope constraint, mainly for the subsequent data migration, that is, we can decide the node distribution of the data after taking the model independently.
An ASCII code for a model confining constraintThis rule is similar to taking a model surround constraint, which supports data symbol letter modulo
String Hash parsingThis rule is an int value hash fragment in an intercept string
Consistency HashConsistent hash budget effectively solves the
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.