"Summary" Mycat distributed database Middleware __ Database

Source: Internet
Author: User
Tags bulk insert redis
1, database overviewIn the internet age, storage and access of massive databecomes the bottleneck problem of the system design and use, for the massive data processing, according to the use scene, mainly divides into two kinds online transaction processing (oltp:on-line transaction processing): Also known as transaction-oriented processing systems, the basic feature is that raw data can be transmitted immediately to the computing center for processing, and in a very short period of time to give the results of processing functions: Daily transaction ProcessingDB Design: Real-Time transaction class applicationData processing: Current, latest details, two-dimensional discrete real-time: High Real-time read and write requirementsTransaction: Strong ConsistencyAnalysis Requirements: Low, simple Online Analytical Processing (OLAP: On-line Analytical Processing ): Refers to the multidimensional way of data analysis, query and report, you can work with data mining tools, statistical analysis tools with the use of enhanced decision-making analysis function: statistics, analysis, reportingDB Design: For statistical analysis class applicationsData processing: The integrated, unified , and multidimensionalReal-time performance: low Real-time read/write requirementsTransaction: Weak transactionAnalysis requirements: High, complex for the above two types of systems have a variety of technical implementation scenarios, storage part of the database is divided into two major categories: relational databaseAnd NoSQL Database relational databaseis a database based on relational model, which is based on mathematical concepts and methods such as set algebra to deal with data representation of the mainstream Oracle, DB2, MS SQL Server and MySQL characteristic data relational model relational models, structured storage, integrity constraintsBased on two-dimensional tables and their relationships, it is necessary connection, and, intersection, difference, exceptSuch data operations using Structured Query Language (SQL) to do data read and write operations need data consistency, need a transaction or even strong consistency of the advantages of maintaining data consistency (transaction processing) can be a join, such as complex query generalization, technical maturity disadvantage data read and write must be parsed by SQL, a large number of data, Low read and write performance under high concurrency to read or write data, or to modify a data structure needs to be locked, the impact of concurrent operations can not adapt to the unstructured storage expansion difficult to expensive, complex NoSQL Database, which is called not only SQL, means that the relational database is used when the relational database is applied, and it is not necessary to use a relational database when it is not applicable. Consider using a more appropriate data storage representative TemporaryKey-Value Storage (memcached, Redis) PermanentKey-Value Storage (ROMA, Redis) Document orientedDatabase (MongoDB, CouchDB) column-orientedDatabase (Cassandra, HBase) features unstructured storage based on multidimensional relationship model has a unique advantage of the use of scenarios and high concurrency, large data read and write ability of the basic support of distributed, easy to expand, scalable simple, weak structural storage shortcomings join and other complex operational capabilities weaker transaction support weak pass Poor application of complex business scenarios without complete constraints 2,mycat Overview function DBAMycat is the MySQL server, and mycat behind the MySQL server, like a MySQL storage engine, such as Innodb,myisam, so, Mycat itself does not store data, data is stored on the MySQL back-end, So data reliability and transactions are guaranteed by MySQL EngineerMycat is a database server approximately equal to MySQL, you can connect MySQL with the way to connect Mycat (except the port is different, the default Mycat port is 8066 rather than MySQL 3306, so need to add port information on the connection string) You can use Mycat with your familiar object mapping framework, but it is recommended that you use the underlying SQL statements as much as possible for a partitioned table, because it achieves the best performance, especially if you have tens of millions of or even tens of billions of records ArchitectMycat is a powerful Database Middleware, not only to be used as Read and write separationAnd Sub-table libraryDisaster Recovery backup, and can be used to Multi-tenant application developmentYunping Platform Infrastructure, so that your architecture has a strong AdaptabilityAnd Flexibility, with the help of the upcoming release of Mycat Intelligent Optimization Module, the system's data access bottlenecks and hotspots at a glance, based on these statistical analysis data, you can automatically or manually adjust the back-end storage, mapping different tables to different storage engines, and the entire application of the code line does not change principleOne of the most important verbs in the mycat principle is " Intercept", it intercepts the user sent over the SQL statement, the SQL statement is first done with some Specific Analysis: As points Slice AnalysisRouting AnalysisRead and write separation analysisCaching AnalysisAnd so on, and then send this SQL back to the real database, and the results returned to do the appropriate processing, and eventually returned to the user Application ScenariosPure read and write separation, at this time the simplest configuration, support Read and write separationmaster and slave switching Sub-table library, for more than 10 million of the table to slice, the largest support of 100 billion of the single sheet of multiple tenant applications, each application of a library, but the application is only connected to Mycat, so that the program itself, to achieve a multi-tenant reporting system, with the help of Mycat's ability to handle the statistics of large-scale reports Instead of hbase, the analysis of large data as a real-time query of massive data is a simple and effective solution, such as 10 billion frequent query records need to query in 3 seconds results, in addition to based on the primary key query, there may be scope query or other property query, at this time mycat may be the most simple and effective choice 3,mycat Concept Database MiddlewareMycat is a database middleware, which is intermediate service between data processing and interaction between database and application. Since the previous fragment of the data is processed, from the original library, was cut into a number of fragmented databases, all of the fragmented database cluster composed of the entire database storage database middleware, the application needs to concentrate and business processing, a large number of common fragmented cluster, data source switching, transaction processing, Data aggregation is handled by the middleware Logical LibrariesFor practical applications, there is no need to know the existence of middleware, business developers only need to know the concept of the database, so the database middleware can be considered as one or more database clusters of logical library Logical Tables Logical TablesIn a distributed database, a table that reads and writes data is a logical table for applications. Logical table, can be data segmentation, distributed in one or more fragments library, also can not do data segmentation, not fragmented, only a table to constitute Fragment TableRefers to the original very large data table, need to split into a number of database tables, so that each fragment has a part of the data, all the fragments constitute the complete data through the <table> datanode configuration of multiple slices of nodes Non-fragment tableNot all of the tables in a database are large, some tables can not be split, the datanode is relative to the fragment table, and those who do not need to data segmentation of the table through the <table> configuration of a fragment node er tableBased on the data partitioning strategy of E-R relationship, the record of the child table is stored on the same data fragment as the associated parent table record, that is, the child table is dependent on the parent table, and the table group is used to ensure that the data join is not grouped across the Library Action table (table Group) is a very good way to solve the data join, and it is also an important rule of data segmentation planning. Global TableIn a real business system, there are often a lot of similar Dictionary TableThe tables, these tables are basically rarely changed, the dictionary table has the following several characteristics change infrequent data volume overall change little data scale is small, there are few more than hundreds of thousands of records for this kind of table, in the case of fragmentation, when the business table due to the size of the partition, the business table and these affiliated dictionary table, the relationship between the becomes a tricky problem, so mycat through Data RedundancyTo solve this kind of table join, that is, all of the fragments have a copy of the data, all the dictionary table or some tables that match the characteristics of the dictionary are defined as global table data redundancy is a good way to solve the cross slicing data join and another important rule of data splitting planning. Fragment Node fragment node (datanode)After data segmentation, a large table is divided into different fragmented databases, where each table fragment resides in a partitioned node (Datanode) node Host (datahost)Data segmentation, each fragment node (Datanode) will not necessarily monopolize a machine, the same machine can have more than one fragmented database, so that one or more fragmented node (Datanode) is located in the machine is the node host (datahost), in order to circumvent the single node host concurrency limit, As far as possible the high read and write pressure of the piecewise node (datanode) evenly placed in the different node host (datahost) Fragmentation rules (rule)In front of the data segmentation, a large table is divided into several pieces of table, you need certain rules, so according to a certain business rules of the data to a fragment of the rule is the fragmentation rules, data segmentation to select the appropriate fragmentation rules is very important, will greatly avoid the difficulty of subsequent data processing Global serial Number (sequence)After the data segmentation, the PRIMARY KEY constraint in the original relational database will not be used under the distributed condition, so it is necessary to introduce the external mechanism to guarantee the data uniqueness identification, the mechanism of guaranteeing the global data unique identification is Global serial Number (sequence) Multi-tenant: Multi-tenant technology or multiple leasing technology, is a software architecture technology, which is to explore and implement how to share the same system or program components in a multiuser environment, and still ensure the isolation of data between users. Independent database sharing database, isolation data schema sharing database, sharing data architecture 4,mycat Use mycat Configuration Schema.xmlThe logical libraries, tables, partitioning rules, Datanode, and datasource schema tags that govern mycat are used to define logical libraries in MYCAT instances Mycat can have multiple logical libraries, each with its own related configuration table label The table label defines the mycat logical table, and all tables that need to be split need to define the Datanode tag in this tab to define the Datanode that the logical table belongs to, and the value of the property needs to correspond to the value of the Name property in the Datanode tag datahost As the last label in the Schema.xml, the label exists as the lowest label in the Mycat logical library, directly defining the specific database instance, the read-write detach configuration, and the heartbeat statement. Server.xmlServer.xml almost saves all mycat required system configuration information Optimization Configuration user label System label Rule.xmlThe definition of the rules involved in splitting the table is defined in the Rule.xml. We can flexibly use different slicing algorithms for the table, or use the same algorithm for the table, but the specific parameters are different Tablerule label defines the table rule function label Table Association issues (table join) Join OverviewInner join: Intersect LEFT JOIN RIGHT join full join: and set recommended to use Inner join Global TableDictionary table changes infrequently the amount of data is not large, there are few more than hundreds of thousands of records the insert and update operations of global table attributes global tables are executed in real time on all nodes, maintaining the data consistency of each fragment the query operation of the global table is obtained from only one node A global table can join with any table ER JoinThe data partitioning strategy based on E-R relationship, the records of the child table and the associated parent table records are stored in the Same Data fragmentOn Share JoinSharejoin is a simple cross fragment join, based on the HBT way to implement the current support of 2 tables, the principle is to parse the SQL statements, split into a single table of SQL statement execution, and then the data collection of each node Catlet (Artificial intelligence)Solving the problem of a partitioned SQL join is far more complex than it might be, and often fails to achieve efficient processing, which, in this case, relies on artificial intelligence to programmatically solve a particular number of join logic in a business system that must span a fragment of SQL, MYCAT provides a specific API for the programmer to invoke, This is the Mycat innovative idea--artificial intelligence Mycat Fragmentation Rules OverviewIn the process of data segmentation, especially in horizontal segmentation, the two processes that the middleware ultimately wants are Segmentation of DataAggregation of data。 Choosing the right segmentation rule is critical because it determines the ease of subsequent data aggregation, and even avoids the important principles of data aggregation across libraries, including several Data RedundancyTables Grouping (table group), this is a good way for businesses to circumvent Cross Library joins, but not all business scenarios are appropriate for such rules, so this chapter will explain how to choose the right segmentation rules several slices mycat Global Table er Fragment table Multi-Many-to-many AssociationThe general principle is to see which table the relational table favors from a business perspective PRIMARY Key FragmentationVs non-primary key fragmentationWhen you don't have any fields that you can use as a fragment field, primary key fragmentation is the only choice, its advantage is based on the primary key query fastest, when the use of automatic growth of the serial number as the primary key, but also more uniform data fragmentation on different nodes if there is a suitable business field is more appropriate as a fragment field, This business field fragment is recommended, and the conditions for selecting a fragment field are as comparable as possible Evenly distributedData to each node the business field is most frequent, or most importantThe query criteria Mycat Common Partitioning rules Fragment EnumerationFor example, some businesses need to follow ProvincesOr district CountyTo do the preservation while the national provinces and counties fixedOf fixed piecewise hash algorithmThis rule is similar to the decimal Model Findingoperation, the difference in so Binary SystemThe operation is to take the ID of the binary low 10 bits, that is, id binary &1111111111. The advantage of this algorithm is that if you follow the 10 modulo operation, 1-10 will be divided into 1-10 slices in the continuous insertion 1-10, increasing the difficulty of the insert transaction control, and this algorithm according to the binary system may divided into successive slicesReduceInsert Transaction Control Difficulty Model FindingThis rule for the operation of the Fragment field is very clear, that is, based on the ID of the decimal model budget, compared to fixed fragment hash, this type of bulk insert may be inserted in a single transaction insert multiple data fragmentation, increase transaction consistency difficulty Slice by date (day)This rule is Slice by dayThis kind of rule is the combination of modulo operation and scope constraint, mainly for the subsequent data migration, that is, we can decide the node distribution of the data after taking the model independently. take exemplary confining constraintsThis kind of rule is the combination of modulo operation and scope constraint, mainly for the subsequent data migration, that is, we can decide the node distribution of the data after taking the model independently. An ASCII code for a model confining constraintThis rule is similar to taking a model surround constraint, which supports data symbol letter modulo String Hash parsingThis rule is an int value hash fragment in an intercept string Consistency HashConsistent hash budget effectively solves the

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.