Note:row/cow Latest update please jump to "again talk about COW, ROW snapshot Technology" directory
The difference between a directory snapshot and a backup Snapshot snapshot technology full snapshot incremental Snapshot COW write-time copy snapshot technique ROW write redirect snapshot technology last
The difference between snapshots and backups
Traditionally, people have been using data replication, backup, recovery and other technologies to protect important data information, regular data backup or replication. Because the data backup process can affect application performance and is time-consuming, data backups are often scheduled to occur when the system load is lighter (such as at night). In addition, in order to conserve storage space, it is usually combined with full volume and incremental backup technology. Obviously, there is a significant deficiency in this way of data backup, that is, the backup window problem. During the data backup, the enterprise business needs to temporarily stop the external provision of services. As enterprise data volumes and data growth speed up, this window may grow longer, which is unacceptable for critical business systems. such as banks, telecommunications and other institutions, information systems require 24*7 uninterrupted operation, short downtime or a small amount of data loss will lead to huge losses. As a result, you need to shrink the data Backup window as much as possible, or even reduce it to 0. Technologies such as data snapshots (Snapshot), Continuous data protection (CDP, continuous data Protection) are the data protection technologies that arise to meet such requirements.
It is important to note that, with the increasing reliance on information systems, even in traditional key industries such as banking and telecommunications, more and more systems in government, education, and enterprise require smaller backup windows and shorter downtime. Reduce the cost of data protection, improve the application awareness in the process of data protection, and gradually become the primary needs of customers.
Advantages of Snapshots:
Snapshots can be set up within seconds to be used by backup applications. With the snapshot technology, with the common backup software is implemented: The graphical management interface to make snapshots of the command snapshot function automatically look for no data changes at the moment to copy, a few seconds after the copy generated and then use the backup software to back up the copy
Mirroring of snapshots can restore data to the point-in-time of snapshots in seconds, allowing system administrators to selectively and quickly recover damaged or deleted files
There are many uses for data snapshots, such as the need for an up-to-date production data to test new systems or to provide decision support and data analysis, while the system is not shut down, and a tape backup is a long time to recover data. This situation can use the backup function of the data snapshot to establish the snapshot copy at any point in time, use the data of the copy to test and analyze without affecting the normal use of the system Snapshot snapshot technology
SNIA (Storage Network Industry association) the snapshot is defined as a fully available copy of the specified data collection that contains the mirror image of the corresponding data at a point in time.
As defined by SNIA, snapshots have both types of full and incremental snapshots, each using different snapshot techniques: Full snapshots:
Mirror Detach (split Mirror) incremental Snapshot
Write-time copy (Copy-On-Write) write-time redirection (Redirect-on-write)
The flexibility in writing redirected snapshots and the high efficiency of using storage space, coupled with the popularity of distributed storage, make it the mainstream of snapshot technology. Full-Volume snapshots
Also known as full copy snapshots or copy-as-is, using mirrored detach snapshot technology to create and maintain a full mirrored volume for the source data volume before reaching the preset snapshot point. Each time the data is written to disk, both the source data volume and the mirrored volume are written to ensure that the two copies of the same data are saved on the source data volume and the mirrored volume, and a mirrored pair of both. When the preset snapshot point-in-time arrives, the data write operation of the mirrored pair is stopped, and the mirrored volume is quickly detached from the mirror pair and converted to a snapshot volume, thus obtaining a snapshot of the data. After a snapshot volume completes a data snapshot/data backup application, it synchronizes with the source data volume and becomes a new box of mirrored volumes.
Then, for the source data volume that you want to keep multiple consecutive point-in-time snapshots at the same time, you must create more than one mirrored volume for it, and when the first mirrored volume is converted to a snapshot volume and is backed up as a data backup, the second, previously created mirrored volume is immediately synchronized with the source data volume and becomes the new mirror pair.
The advantage of a mirrored split snapshot is that data isolation is good, making it possible to access data offline, and simplifying the process of recovering, copying, or archiving all the data on a hard disk. The most important thing is that the operation is very short, only the time it takes to disconnect the mirrored volume, usually only a few milliseconds, and this small backup window will have little impact on the upper application. There is no interaction between the snapshot volume and the source data volume, but the disadvantage of this approach is obvious and lacks the flexibility to create snapshots of any data volume at any point in time. In addition, it requires one or more mirrored volumes with the same capacity as the source data volume, consumes a large amount of storage space, and writes data with two copies at the same time, has a greater impact on write performance, and reduces the overall performance of the storage system while synchronizing mirroring. In order to solve the full snapshot method realized by the mirror-separated snapshot technology, the implementation method of the differential snapshot and the Cow/row two specific difference volume snapshot technology are introduced. Incremental snapshots COW Write-time copy-snapshot technology
As pictured above, COW first creates a data pointer table for each source data volume a physical pointer to all data from the source data volume (Base Volume), and when the snapshot is created, the storage system copies a copy of the source data volume pointer table as the snapshot volume data pointer table. Also, the snapshot volume is established only when the snapshot is created, which consumes a relatively small amount of storage space for the updated data in the source data volume after the snapshot point in time COW. The specific steps are as follows: Step 1: Generate the source data volume data pointer table Step 2: Create snapshots Step 3: Copy the snapshot volume data pointer table from the source data volume data pointer table Step 4: Generate Snapshot Volume Step 5: Raw number in source data volume receive update operation instruction Step 6: The original data from the source data volume is copied to the snapshot volume (reserved space). The next write for this location will no longer perform a write-time copy operation Step 7: Update the snapshot volume pointer table step 8: Update raw data for the source data volume Step 9: Repeat the steps 5~8 until the next snapshot is performed
You can see from the steps above that the original copy was meant to copy the raw data to the snapshot volume when the original data in the source data volume was updated. When we need to recover the snapshot, we simply need to follow the snapshot pointer table to address it. And COW is very flexible to use, you can create snapshots of any source data volume at any time.
Advantages: COW does not consume any storage resources or affect system performance until the snapshot operation. Snapshots are created very quickly and can be instantaneous when snapshots are created because they share the same physical data with the source data volume through their respective pointer tables without the need for a full copy of the volume. COW the length of the backup window generated when the snapshot was created is linearly proportional to the Size of the source data volume, typically for a few seconds, while the snapshot volume consumes much less storage space than a full snapshot.
Disadvantage: COW Because every write that occurs after a snapshot is created requires that the original data from the source data volume be copied to the snapshot volume before the source data volume is written, thus reducing the write performance of the source data volume. And obviously, if you make multiple snapshots of the same source data volume, the write performance will be much lower. Another is because the snapshot volume only holds some of the raw data from the source data volume, so the full physical copy cannot be obtained, and the application that requires a full physical copy is powerless, and if the amount of data copied to the snapshot volume exceeds the reserved space, the snapshot will fail.
Scenario: COW Snapshot Technology A data update operation after the snapshot is created actually requires a read operation (read the data for the source data volume) and two write operations (write source data volume and write snapshot volume). Therefore, COW is more suitable for the application of the storage device to read and write less scenes. In addition, if an application is prone to write hotspots for a storage device (write-only for a limited range of data), it is also a more desirable option. Because its data changes are limited to one scope, a write-time copy operation occurs only once for multiple writes of the same data. REDIRECT Snapshot technology when ROW writes
As shown above, VD represents the source data volume, and snap represents the snapshot volume, which is consistent with the snap when the source data volume creates a snapshot. If the data on the source data volume is updated after the snapshot is created, it is not like COW to directly modify the source data volume raw data, but instead opens up a new space for storing new data for updating the original data. The steps are as follows: Step 1: Create a snapshot
Step 2: Copy the data from the source data volume corresponding to all the redirected write data since the last snapshot to generate a snapshot of this point-in-time, and then write the redirected write data back to the corresponding location on the source data volume
Step 3: Raw data from the source data volume received the update operation instruction Step 4: Open up a new data storage volume (reservation) Step 5: Redirect the original data from the source data volume data pointer table to the newly opened data storage Volume Step 6: Write the updated data into the newly opened storage space Step 7: Repeat the 3~6 until the next snapshot
Note 1: While the read operation requires read redirection, it is necessary to read redirects to locations that have write redirects, depending on whether the location where the data is read has been written redirection since the last snapshot.
As you can see from the above steps, the intent of the write-time redirection is to redirect the updated raw data pointer in the source data volume data pointer table to the new storage space when the original data in the source data volume is updated. So to the end, the snapshot volume's data pointer table and its corresponding data have not been changed. When you restore a snapshot, you can complete the recovery by simply addressing the snapshot volume data pointer table.
Advantage: The write operations after the snapshot of the source data volume are redirected, all write IO is redirected to the new volume, and all the snapshot volume data (old data) remains in the read-only source data volume. The advantage of this is that updating the source data volume requires only one write operation, which solves the performance problem of COW two times. So the most obvious advantage of ROW is that it does not degrade the write performance of the source data volume.
Disadvantage: ROW's snapshot volume data pointer table holds the original copy of the source data volume, and the source data volume data pointer table holds the updated copy, which causes the snapshot volume data pointer table to be synchronized to the source data volume before the snapshot volume is deleted. And when multiple snapshots are created, a snapshot chain is generated, making access to the raw data, tracking of the snapshot volume and source data volume data, and deletion of the snapshot becoming extraordinarily complex. For example, a total of 10 snapshots were performed, and in the case of a snapshot recovery, to revert to the most recent snapshot point, you would need to merge 10 snapshot files to achieve recovery. So the main disadvantage of ROW is that there is not a full snapshot volume, but a snapshot volume at different times to form a specific snapshot point in time. The more the snapshot hierarchy, the higher the overhead for snapshot recovery. In addition, because the data pointed to by the source data volume data pointer is quickly redirected, the other major disadvantage of ROW is the reduced read performance (local space principle).
Application scenario: On traditional storage devices, row snapshots have been read and written many times, the source data volume data is dispersed, for continuous reading and writing performance is inferior to cow. So ROW is more suitable for write-intensive (write-intensive) storage systems. However, on distributed storage devices, row's continuous read and write performance will be better than COW. Generally speaking, the bottleneck of reading and writing performance is on disk. The characteristic of distributed storage is that the more data is dispersed into different storage devices, the higher the system performance. Therefore, the ROW of the source data volume redirection of dispersion has brought benefits. Therefore, ROW has gradually become the mainstream of the industry. At last
At the end of a graph, the backup here can be roughly replaced with mirroring to understand.