Thoughts on data splitting

Source: Internet
Author: User

"horizontal segmentation" and "vertical segmentation" are often used in daily life and work, these two things are in a state of mutual understanding. Write your own understanding.
in plain words, let's talk about "horizontal segmentation" and "vertical segmentation" (in fact, I don't know what the professional or academic saying is ).
assume that there is a table with three fields: ID, name, and description. description is a text-type data, which is generally very long. The principle of not splitting is to store all the data in the table, but if the record ID % 2 = 0 is placed in Table1 with the same table structure as the table, place the records with ID % 2 = 1 in Table2 with the same table structure as the table. Such data is split horizontally. However, if you divide a table into two Word tables Table1 and Table2, their table structures are Table1 (ID, name) and Table2 (ID, description ), this split is vertical split.
what are the benefits of horizontal and vertical splitting? The most intuitive advantage is that horizontal segmentation reduces the burden on a single database and increases the maintainability of data. For example, the backup granularity can be fine-tuned to the backup of a single ID; vertical splitting can fully leverage the frequency of data access to improve system performance. In the above example, the attributes ID and name are usually used, however, data such as description is less frequently used. We have paid the cost of redundant IDs to split the description so that the description can be read as needed, reducing bandwidth requirements and disk I/O. Of course, there are other benefits, such as vertical splitting, which allows us to apply different security levels to different fields. However, it should be noted that although horizontal splitting and vertical splitting have so many advantages, their resource consumption is also appropriate.

what are the methods to achieve horizontal and vertical segmentation? There should be a variety of methods, but whatever method, you should remember that when splitting, please strictly analyze the application scenario, consider the necessary and splitting advantages, and so on, do not move hard. Well, let's talk about vertical splitting. This is more about the system design phase, how is system coupling, how to reduce coupling, and how to combine data, how to Design the splitting principle is a problem that needs to be considered for Vertical Data splitting. If the Coupling Degree of the system is too high, once the system is finalized, vertical splitting will be difficult in the future. Compared with vertical splitting, horizontal splitting is simpler. Horizontal splitting does not "split tables". It should be "copying tables ", in the first example, we used ID % N for splitting, consistent hash, and Data Access frequency. It should be noted that we generally use the same table name for horizontal data splitting. In fact, this is not necessarily the case. Only tables that can correctly split data routes can be used. We should use the same table name to simplify data routing. This is not certain. For example, in a resource download website, the downloads of popular resources are very large, while those of non-popular resources are very small, we split the table horizontally Based on the download volume. The table structure is schema, but the table names are file1 and file2. file1 stores high-volume data and file2 stores low-volume data. Then route the Data Access frequency to file1 and file2. In this example, different table names are used.
I have mentioned some of the above incorrect or unprofessional ones. please correct me and give me some advice.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.