In the business requirements, we often need to be able to record historical information in the system, to see the historical changes, we can increase the start end time field to record the historical version of the data. The history of the data is mainly divided into: relationship, attribute history, Entity history and change history.
Relationship, Attribute history
The so-called relationship history refers to the existence of a historical version of the relationship between two entities. For example, the Department table and the employee table, for a moment, a department has multiple employees, an employee belongs to only one department, so it is a one-to-many relationship. And we want to record the history of this relationship, then we will form many-to-many relationships. Many-to-many relationships form an intermediate table, and then we can record the history of the relationship by adding the start time and end time fields to the intermediate table.
The attribute record history of an entity will form a one-to-many relational table, such as product price attributes, we want to record all the historical pricing, then the product and price of a one-to-many relationship.
In the AdventureWorks database, we can see a lot of this history of record relationship design. Like what:
History of employees, departments, shifts:
This is an example of a one-to-many relationship mentioned earlier because record history becomes a many-to-many relationship.
Product history of cost and Price:
This is the classic attribute history, for the many attributes of the product, our relationship between the cost and the price history of these two attributes, all can establish a one-to-many relationship price history table.
History of sales and regional and sales quotas:
Area and sales are also common one-to-many relationships, a sale belongs to a piece area, one region corresponds to multiple sales. Now because of history, so form many-to-many relational table salesterritoryhistory. For sales quotas, because they are recorded quarterly, there is only one sales quota in a quarter, so there is no need to start and end time, only one quarter of the first day (the end time can be calculated based on the first day of the quarter, so there is no need for storage).
The relationship between the area and the salesperson after increasing the intermediate table formed many to many, still retained the original one-to-many relationship, from the data is not so, because the data of two tables is inconsistent, so I infer that this is another one-to-many relationship, rather than the original region and the distribution of sales of the corresponding relationship table.
When you need to record a relationship or attribute history, the relationship will be promoted a complexity, that is, the original one-on, now will become a pair of more, the original is a pair of many, now will become many-to-many. Increase the start time and end time two fields in the history table to indicate the time validity of the row data. The AdventureWorks database uses a NULL value set to "End time" to indicate that this data is currently valid, but the author does not recommend this, it is best to set two fields are not NULL, in comparison can get a unified query:
In addition salesterritoryhistory This table only records "Start time" and does not record "end time" This is also a bad design, although the end time can be calculated, but each time the query will need to calculate the end time, really is not a good way. It is best to keep two fields, the user only need to enter the start time, by the front-end program to initialize the end time, and then save.
Entity History Master Entity history
The history of an entity is any change to an entity's data, which results in a new record for the entire data, rather than just a property or relationship. We can also take the time to add the start time to the entity history, but more often we do not record the entire entity history to query the value of this entity at any point in history, but to record a "version" of the information to facilitate the audit of changes in an entity to compare. If we are a historical version recorded for audit purposes, then these historical data will not be involved in the business query, so we do not need to record the start time, the end time, instead, we can add the "version" field, of course, the audit used the "Last Update Time" and "Last Update",
In this case, if we just add the version field, it would be cumbersome to query the current version, because we have to get the highest version number, and then we have to take the latest version of the record as the current record, in order to optimize this performance problem, we generally need to add the Boolean type " is the current version iscurrent field to identify the current version. When this field is added, it becomes more cumbersome to change the entity data. You first need to get the old data version number, +1 to generate a new version number, and then set the old data "whether the current version" field to 0, update the old data "last updated" and "Last updated", and then insert the new version number of the data, and the new version is the current version. I don't see the design of the entity's history in the AdventureWorks database, but we can look at the SharePoint database design, which is the approach to the version design I mentioned here. Interested can look at the SharePoint ContentDB Alluserdata table, Tp_version is the record version, Tp_iscurrent and Tp_iscurrentversion is to mark the current version.
History of affiliated Entities
One of the problems with the entity history is whether the subordinate sub-entities also need to be recorded in conjunction with the history. For example, we want to make a history of such an entity as a purchase order, and each modification to a purchase order generates a new version of the purchase order. If there are 100 purchase orders under a purchase order, after we have edited the Purchase order Master table, we have created a new version of the Purchase master table data, whether the corresponding new version data is also created for the 100 lines of detail? If created, then the amount of data on the purchase schedule will skyrocket, and in fact we do not edit these 100 details, the new version of the details are identical, if not created, then how to maintain the foreign key constraints? After all, the list above the foreign key corresponding to the old version of the purchase order ID Ah!
In fact, both scenarios can be, the first solution is simple to develop, if the details are not so much, or the amount of data of their own documents is not large, then repeating a little schedule does not have much impact. The second scenario will be complex, requiring new and old data to be compared by article-by-article, to find the difference, if the main table has changed, then create a new version of the master table, if there are 2 changes in 100 details, then create a new version for the 2.
The following is a detailed introduction to the model design using the second solution. First, we need to break the foreign keys of the primary and secondary tables, and add the form and item as two separate entities, adding properties such as "version", "Current version", and so on. Add a business primary key "Formnumber" for the form that uniquely identifies a form (because of the reason for the release record, so Formnumber is not the primary key for the form), and then adds "Formnumber" to the Item table to identify which form the item belongs to.
whereIscurrent=1 andIsDeleted=0 andFormnumber=@formNumber;
whereIscurrent=1 andIsDeleted=0 andFormnumber=@formNumber; change history
Regardless of the previous relations, attributes or the entire entity's history, will be in the business table to form new data, the increase of data on the one hand will lead to inefficient query efficiency, on the other hand make every query need to bring additional query conditions, very inconvenient. So we came up with another way to keep our history, and that is, we log the changed parts to the log table like we do the logs.
The benefit of logging the change log is that it does not affect the design of the existing database model, which means that all entities and relationships do not need to be changed, we only need to add a change log table. However, the change log is usually the front-end program through the comparison before and after records, to find the properties of the change, and then write, not the database does. The disadvantage is also obvious, that is, the restoration of historical data is inconvenient, not as the previous model can quickly query the historical state of the data.
Therefore, the Change Log table is only used for audit requirements, not for the business to the historical data query requirements. There is a transactionhistory table in the AdventureWorks database that records the individual order transactions, although it is not a record of the order changes, but there is a structure similar to the change history.
Historical Data Query optimization
It is mentioned that because of the reason of preserving historical data, the data volume of the corresponding table in the database is increased many times, and the increase of data will inevitably cause the query to become slower, so we need to make a query optimization on the table after we record the historical data. Optimizations can take the following solutions:
If our historical data is not needed in our usual business, we can use historical data tables only in special scenarios, so we could create an archive table of the same structure, and then periodically transfer historical data from the business system to the archive table. Of course, the front-end software system to make corresponding changes, for the old historical data need to query the archive table, and the new data is to query the current table. In AdventureWorks, the corresponding archive table was established only for TransactionHistory.
The advantage of building a partition over an archive table is that the old data and the new data can be stored in different places, and the old and new data can each establish their own index tree, and logically to the program is still access to a table, the front-end program does not need to make any changes. For example, for the historical data record of the start end date, we can divide the data with the end date of 9999-12-31 (the current valid data) into one area, and the rest to another area. For the mode of version recording, we can divide "is current version" into one area and divide the other data into another area.
Partition after the update data will cause the old data block transfer, because the old data is originally in the current block, now because of changes to the entity, the old data needs to be transferred to the older chunk, and then insert the new data into the current block, in addition to the movement of the partition and the corresponding index changes, Therefore, it is relatively slow to update data.
If you have an Oracle database, then we can create a bitmap index on the Iscurrentversion field, and if it is a database that does not support bitmap indexing for SQL Server, we can also put Iscurrentversion in the first column when we build the B-Tree index. Because this column is bound to be put into the filter conditions.
Modeling from AdventureWorks Learning database-preserving historical data