Mycat Series-Overview

Source: Internet
Author: User
Tags webp

Database Segmentation Overview

OLTP and OLAP

In the era of Internet, the storage and access of massive data becomes the bottleneck of system design and use, for the mass data processing, according to the usage scenario, there are two types: online transaction processing (OLTP) and online analytical processing (OLAP).

Online transaction processing (OLTP), also known as transaction-oriented processing system, is characterized by the fact that raw data can be transferred immediately to the computing center for processing, and the processing results are given in a very short period of time.

Online analytical Processing (OLAP) refers to the analysis, querying and reporting of data in a multidimensional way, which can be used in conjunction with data mining tools and statistical analysis tools to enhance decision analysis functions.

The main differences between the two can be explained in the following table:


Oltp Olap

System functions

Daily transaction Processing

Statistics, analysis, reporting

DB Design

For real-time trading class applications

For statistical analysis class applications

Data processing

The current, the newest detail, the two-dimensional discrete

A. of a historical, aggregated, multidimensional, integrated, unified

Real-time sex

High real-time read and write requirements

Low real-time read and write requirements

Transaction

Strong consistency

Weak transactions

Analysis requirements

Low, simple

High, complex


relational databases and NoSQL databases

For the above two kinds of systems have a variety of technical implementation scenarios, the storage part of the database is divided into two major categories: relational database and NoSQL database.

Relational database is a database based on relational model, which uses mathematical concepts and methods such as set algebra to process data in a database. Mainstream Oracle, DB2, MS SQL Server, and MySQL are all part of this traditional database.

NoSQL database, all called not only SQL, meaning that the application of relational database when the use of relational database, when not applicable to non-use of relational database, you can consider the use of more appropriate data storage. It is mainly divided into temporary key-value storage (memcached, Redis), persistent key-value storage (ROMA, Redis), document-oriented database (MongoDB, CouchDB), column-oriented database (Cassandra, HBase), Each nosql has its own unique usage scenarios and advantages.

Why use a NoSQL database when traditional relational databases such as Oracle,mysql are very mature and commercially available? Mainly because with the development of the Internet, the data volume is increasing, the performance requirements are more and more high, the traditional database has a congenital defect, that is, single-machine (library) performance bottleneck, and expansion difficulties. This is a single library bottleneck, but it is difficult to expand, naturally unable to meet the growing mass of data storage and its performance requirements, so there will be a variety of nosql products, NoSQL fundamental advantage lies in the era of cloud computing, simple, easy to large-scale distributed expansion, and read and write performance is very high.

The following analysis of the characteristics of the two, and advantages and disadvantages:

relational database

1) The characteristics of the relational database are:

-The data relationship model is based on the relational model, structured storage, and integrity constraints.

-based on the two-dimensional table and the connection between, need to connect, and, intersection, difference, and other data operations.

-Use Structured Query Language (SQL) to read and write data.

-Operations require data consistency, requiring transactional or even strong consistency.

2) Advantages:

-Maintain data consistency (transaction processing)

-You can make complex queries such as joins.

-Universal, mature technology.

3) Disadvantages:

-Data read and write must be parsed by SQL, a large amount of data, high concurrency under low read and write performance.

-to read and write data, or to modify the structure of the need to lock, affecting concurrent operations.

-Unable to adapt to unstructured storage.

-Extended difficulty.

-Expensive and complex.

NoSQL Database

1) The NoSQL database is characterized by:

-Unstructured storage.

-based on multidimensional relationship model.

-Has a unique usage scenario.

2) Advantages:

-High concurrency, strong reading and writing ability under big data.

-Basic support for distributed, easy to scale, scalable.

-Simple, weakly structured storage.

3) Disadvantages:

-complex operations such as joins are weak.

-Transaction support is weak.

-Poor versatility.

-Poor support for complex business scenarios without complete constraints.

Although in the era of cloud computing, traditional database has a congenital disadvantage, but NoSQL database can not replace it, NoSQL can only be used as a supplement to the traditional data can not be replaced, so the disadvantage of avoiding the traditional database is the problem that the big data age must solve. If the traditional data is easy to expand, can be segmented, you can avoid single-machine (library) performance defects, but because the current open source or commercial traditional database does not support large-scale automatic expansion, so we need to use third-party to do processing, that is the book to talk about the data segmentation, the following to analyze how to do data segmentation.

What is data slicing?

Simply put, it means that the data we store in the same database is distributed to multiple databases (hosts) through a certain condition to achieve the effect of dispersing the load of a single device.

The segmentation of data (sharding) can be divided into two segmentation modes according to the type of its segmentation rules. One is to separate tables (or schemas) into different databases (hosts), which can be referred to as vertical (vertical) Segmentation of data, the other is based on the data in the table of the logical relationship, the data in the same table by a certain conditions split into multiple databases (hosts), This segmentation is called horizontal (horizontal) slicing of data.

The biggest feature of vertical slicing is the simple rules, the implementation is more convenient, especially suitable for each business between the coupling degree is very low, mutual influence is very small, business logic is very clear system. In this system, it is easy to split the tables used by different business modules into different databases. Depending on the table to split, the impact on the application is also smaller, the split rule will be relatively simple and clear.

Horizontal segmentation is relatively slightly more complex than vertical slicing. Because to split different data from the same table into different databases, the split rule itself is more complex than the table name for the application, and later data maintenance is more complex.

Vertical slicing

A database is made up of many tables, each of which corresponds to a different business, and vertical slicing refers to classifying the tables according to the business, distributing them to different databases, and then sharing the data or pressure to different libraries, such as:

650) this.width=650; "Src=" http://mmbiz.qpic.cn/mmbiz/ Xruic8oiyw5tevzxtmqp94oaibfgtwzsicuf1zuoibzgyianonqgiahyvd1bh99fhcvg9sawu8qahnet5yv7gfia9ke2g/640?wx_fmt=png &tp=webp&wxfrom=5&wx_lazy=1 "alt=" 640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy= "/>


The system was cut into, users, orders traded, paid several modules.

A good architecture design of the application system, its overall function is certainly composed of many functional modules, and each function module needs to correspond to the database is one or more tables. In architecture design, the more unified the interaction points of each function module, the less the coupling degree of the system, the better the maintainability and expansibility of each module. Such a system, it is easier to achieve vertical segmentation of data.

But often the system of some tables is difficult to achieve complete independence, there is the case of this expansion join, for this kind of table, it is necessary to do balance, is the database concessions business, a common data source, or divided into multiple libraries, the business through the interface to make calls. In the early stage of the system, the data volume is relatively small, or the resource is limited, will choose the common data source, but when the data developed to a certain scale, the load is very large situation, it is necessary to do segmentation.

Generally speaking, the scene of complex join is difficult to be segmented, and often the business is independent and easy to be segmented. How to slice and dice to what extent is a challenge to the technical architecture.

The following analysis of the advantages and disadvantages of vertical segmentation:

Advantages:

· After splitting, the business is clear and the splitting rules are clear.

· Easy integration or expansion between systems.

· Data maintenance is simple.

Disadvantages:

· Some of the business tables can not join, only through the interface method to solve, improve the complexity of the system.

· Single-Library performance bottlenecks are limited by the different limits of each business, and data scaling and performance improvements are not easy.

· Transaction processing is complex.

Because vertical slicing is to spread the table to different libraries according to the classification of the business, some business tables are too large, there are single-library read-write and storage bottlenecks, so they need to be split horizontally to solve.

Horizontal slicing

In contrast to vertical splitting, a horizontal split is not a classification of a table, but rather a rule of a field that is scattered across multiple libraries, with a subset of the data in each table. To put it simply, we can understand the horizontal segmentation of the data as a slice of the data row, that is, some rows in the table are sliced into one database, and some other rows are sliced into other databases.

650) this.width=650; "Src=" http://mmbiz.qpic.cn/mmbiz/ Xruic8oiyw5tevzxtmqp94oaibfgtwzsicubp2edicaw4a2nglrq56lhiarmldfu9dr4blkqjuukcbkshtittqjjxla/640?wx_fmt=png &tp=webp&wxfrom=5&wx_lazy=1 "alt=" 640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy= "/>


Splitting the data requires defining the Shard rule. A relational database is a two-dimensional model of rows and columns, and the first principle of splitting is to find the split dimension. For example: From the member's point of view analysis, merchant order trading system to inquire members of a certain day of a certain order, then you need to follow the membership date to split, different data in accordance with the membership ID group, so that all the data query join in a single library to solve, if from the merchant's point of view, To query the number of orders that a merchant has on one day, it is necessary to split the merchant ID, but if the system wants to split the member and want to press the merchant data, there will be some difficulties. How to find the right Shard rules needs to be measured comprehensively.

Several typical shard rules include:

· According to the user ID, the data is distributed to different databases, and the data with the same data are dispersed into a library.

· Distribute data from different months and even days to different libraries by date.

· Follow a specific field, or spread to a different library based on a specific range of segments.

, the segmentation principle is based on the business to find the appropriate segmentation rules scattered to different libraries, the following with the user ID model example:

650) this.width=650; "Src=" http://mmbiz.qpic.cn/mmbiz/ Xruic8oiyw5tevzxtmqp94oaibfgtwzsicuimnjhyomizpxquis2v62s2bzbdoeicaswaluiazqjzqtbxayljedygua/640?wx_fmt=png &tp=webp&wxfrom=5&wx_lazy=1 "alt=" 640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy= "/>

Now that the data is split, there are pros and cons.

Advantages:

· Split rule abstraction is good, the join operation can basically be done by the database.

· There is no single-library big data, high concurrency performance bottleneck.

· Fewer application-side modifications.

· Improve the stability and load capacity of the system.

Disadvantages:

· Splitting rules are difficult to abstract.

· Shard transaction consistency is difficult to resolve.

· It is difficult to extend the data multiple times and the maintenance amount is great.

· Cross-Library join performance is poor.

In front of the vertical segmentation and horizontal segmentation of the differences and advantages and disadvantages, you will find that each method of segmentation has shortcomings, but the common characteristics of the shortcomings are:

· The problem of introducing distributed transactions.

· Problems with cross-node joins.

· Cross-node merge sort paging problem.

· Multiple data source management issues.

For data source management, there are two main ideas at present:

A. Client mode, in each application module to configure the management of their own needs of one (or more) data sources, direct access to each database, in the module to complete the integration of data;

B. Through the intermediary agent layer to manage all the data sources uniformly, the backend database cluster is transparent to the front-end application;

Perhaps more than 90% of people in the face of the above two ways of thinking will be inclined to choose the second, especially when the system is constantly becoming large and complex. Indeed, this is a very correct choice, although the cost in the short term may be relatively larger, but it is very helpful for the overall system scalability.

Mycat the shortcomings of traditional databases with data segmentation and the benefits of NoSQL's ease of expansion. Through the intermediary agent layer to avoid the multi-data source processing problem, the application is completely transparent, at the same time, the problem of data segmentation, but also made a solution. The following chapters analyze the origin of mycat and how to do data segmentation.

The difficulty of data join after data segmentation share the experience of data slicing here:

The first principle: can not slice as far as possible not to slice.

The second principle: if you want to slice must choose the appropriate segmentation rules, planning well in advance.

Third principle: Data segmentation minimizes the possibility of cross-library joins by data redundancy or table group.

The Forth principle: Because the database middleware is difficult to grasp the merits of data join, and it is difficult to achieve high performance, the service reads as few as possible using the multi-table join.

What is mycat,maycat from, and how to solve these problems, the next chapter let us analyze.


Read more about the public number: It_haha

This article is from the "djh01" blog, make sure to keep this source http://djh01.blog.51cto.com/10177066/1787643

Mycat Series-Overview

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.