What is data segmentation, in simple terms, means
The data stored in the same database is distributed to multiple databases (hosts) in order to achieve the effect of dispersing the load of a single device through a certain condition. Segmentation can disperse the load of a single device and improve the overall performance. But there are some problems with segmentation, introducing
Distributed Transactions,
Cross-node join,
Cross-node sorting paging, and
Multi-data source management。 The first few issues are for the service side, and the last one is for the client. These problems are easy to understand, the data in a database is divided into multiple nodes, then the corresponding is distributed transactions, Cross-border Point join, as well as sorting and paging and so on. But for the client, also from a single data source into multiple data sources, in the operation also become more complex. The Data segmentation (Sharding) can be divided into two kinds of segmentation modes according to the type of the segmentation rule.
one is to divide the data into different databases according to different tables, which can be called vertical segmentation; the other is
According to the logical relationship of the data in the table, the data in the same table is divided into several databases according to some conditions, which is called the horizontal segmentation of the data.。 Concrete vertical segmentation and horizontal segmentation to the following specific introduction, here is not much to say. In terms of vertical segmentation, in fact, can also be divided into two forms, one is the database level of segmentation, which is mentioned above, according to different tables to the different database. There is also an understanding of vertical segmentation, table-level vertical segmentation, that is, a table of fields, separated into more than one table. These tables can be placed on the same node or on different nodes, and the sliced table is associated with the specified fields. Of course, for table-level segmentation, if the database design at the beginning of the relatively perfect, there is no need, plainly, in fact, table-level vertical segmentation is a database design. There are often three words used here: partitioning, slicing, splitting. Partitioning is a database-supported mechanism that divides a physical file into multiple files and is a function provided by the database. Segmentation and fragmentation can be understood as a different translation of the same thing. But to be exact, I think it's a little different. Segmentation is the idea of spreading the data from the same database to a different database or host. Fragmentation, which can be thought of as a result of segmentation, each node or each database is a fragment. The above mentioned segmentation is divided into different databases or different hosts, of course, can also be divided into the current library, where the concept of multiple tenants involved. Many tenants have three levels, respectively
Standalone Database Independent database security is highest, independence is best, backup recovery is simpler than arrogance, but the cost is highest.
Sharing databases, isolating data schemas Share the database and isolate the data schema between the two.
Sharing databases, sharing data schemas Shared databases, shared data schemas with the lowest isolation level, minimum security, and data backup and recovery difficulties when the cost is minimal.
How to achieve the data segmentation. Implements a variety of tools, I sum up three points,
The first is not using any tool, we manually to the client data segmentation operation, this is undoubtedly in the repetition of the wheel, and performance, stability is not guaranteed, and higher costs.
The second is the use of client components, since the mention is repeated to build the wheel, then there must be a corresponding component to deal with these things, through the components to deal with, can reduce costs, improve stability and security. The more representative open source component is the sharding-jdbc of Dangdang, the document is very detailed.
The third is through the database middleware to achieve, this way is the simplest and the best scalability. There are a number of mainstream database middleware, including the representative of Ali's amoeba, Qihoo 360 atlas, and open source organizations to provide mycat and so on. Each has its own advantages.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.