Extreme storage-historical linked list (top) and zipper

Source: Internet
Author: User

Extreme storage-historical linked list (top) and zipper

In the data model design process of the data warehouse, we often encounter the following requirements:


1. Large data volume;
2. Some fields in the table will be updated, such as the user's address, product description, and order status;
3. You need to view the history snapshot information of a time point or time period. For example, to view the status of an order at a time point in history,
For example, you can view how many times a user has updated over the past period of time;
4. the ratio and frequency of changes are not very large. For example, there are a total of 10 million members, and about 0.1 million of new and changed members are added every day;
5. If you keep a full copy of the table every day, a lot of unchanged information will be saved in each full copy, which is a great waste of storage;

The zipper History Table not only reflects the historical status of data, but also saves storage to the greatest extent;

For example, there is an order table with three records in June 20:

By June 21, the table had five records:

By June 22, the table had 6 records:

How to retain the table in a data warehouse:

1. If only one full copy is retained, the data is the same as the record in December June 22. If you need to check the status of order 001 in December June 21, it cannot be met;

2. if a full copy is retained every day, there are 14 records in the table in the data warehouse, but many records are retained and there is no task change, such as 002,004 orders, a large amount of data, it will cause a great waste of storage;

 

If the table is saved as a historical linked list in the data warehouse, the following table is displayed:

Note:

1. dw_begin_date indicates the start time of the lifecycle of the record, and dw_end_date indicates the end time of the lifecycle of the record;

2. dw_end_date = '2017-12-31 'indicates that the record is currently in the valid state;

3. If you query all valid records, select * from order_his where dw_end_date = '2017-12-31'

4. if you query historical snapshots of, select * from order_his where dw_begin_date <= '2017-06-21 'and end_date> = '2017-06-21'. This statement queries the following records:

The records from the source table in June 21 are exactly the same:

It can be seen that such a historical linked list can not only meet the needs of historical data, but also greatly save storage resources;

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.