Deduplication technology is also called capacity optimization protection technology. What benefits does it bring to customers' computer systems, especially backup systems?
At present, the deduplication technology is mainly used for data backup, and some companies claim that it will be applied to primary storage, but it is not the mainstream after all. The deduplication technology can provide a larger backup capacity for longer data retention, continuous verification of backup data, improve the level of data recovery services, and facilitate data disaster tolerance.
Larger backup capacity
The backup data contains too many redundant parts, especially in full data backup. Although Incremental Backup only backs up the changed files, Incremental Backup usually contains redundant data blocks.
The principle of deduplication technology is to save only the data segment of the backup data. When data is written to the backup device, the data is divided into variable-length data segments. The deduplication device compares the data segment with the data segment that has been stored in real time. This method ensures that only one copy of each unique data segment is retained. Because duplicate data deletion devices can find duplicate files and data segments in files or between files or even data blocks, therefore, the actual storage space is less than the data volume to be stored. The key to capacity optimization efficiency lies in algorithms. The technical basis for capacity optimization is not new, but has been in the academic circle for decades.
Data can be continuously verified
Currently, the product that uses the deduplication technology on the market is different in that the location where deduplication is implemented is different from the size of the split part of the file, but what's more important is how integrity and consistency checks are performed when data is written to the backup device. In the primary storage system, logical consistency check is always accompanied by risks. If a software defect leads to incorrect data writing, the data block pointer and bitmap may be damaged. Generally, the ideal solution is to run the file system check program (such as Fsck) after the file system is detached ). If the backup data is stored in the file system, the error is hard to be found until it is restored. When recovery is required, it may not be enough time to correct the error.
Backing up data is the most valuable part of backup. Backup Data is not frequently accessed. To access backup data, manual or system faults often occur and data recovery is required. To check the consistency of the file system in the recovery operation, you need to wait until the next system restart or let the system go offline, which increases unnecessary risks. Therefore, excellent deduplication devices should have an end-to-end verification process.
Higher data recovery level
The level of backup data recovery service determines whether data can be restored accurately, quickly, and reliably to the backup device.
Oracle databases usually load business data that enterprises need to protect most. Enterprises often use full backup or Incremental backup to protect Oracle databases. The full backup mode supports fast backup and recovery. This is because Incremental Backup often scans the entire database to detect changed data blocks, in addition, the Incremental backup method also requires one full backup and multiple Incremental backup during recovery, which also affects the recovery speed.
In this case, why do many enterprises require incremental backup? This is because full backup requires more backup time and space than Incremental backup. Backup devices with the deduplication function can solve the above problems.
For backups of databases represented by Oracle, the backup time consists of the time when data blocks are traversed (especially Incremental Backup) and the data transmission time. For Incremental backup, data block traversal scans the database to find changed data blocks. This takes a long time. As the performance of the backup device is further improved, the time required for full and Incremental backup of the database is almost the same.
Disk-based backup devices provide high-performance and online deduplication functions. Therefore, only a small amount of storage space is used for multiple full backups of Oracle databases. The storage space occupied by daily full backup and block-level Incremental backup is basically the same. Compared with normal backup devices, backup devices that use the deduplication Technology for full backup can save 95% of disk consumption.
When backing up key data, backup devices that use the deduplication technology can use full backup instead of Incremental backup to improve the level of data recovery services.
Easy Disaster Tolerance for backup data
Disaster Tolerance Technology, which uses data replication technology as the mainstream, is very concerned with real-time data replication, while disaster tolerance for backup data is not concerned. Because the deduplication technology can optimize the capacity of backup data, only a small amount of disk increments are required for full backup every day, remote transmission over the WAN or LAN is the data after capacity optimization, which can greatly save network bandwidth.
Nowadays, many enterprises regard online replication of backup data as an alternative solution for remote tape storage. With the replication solution, data is copied from the local primary disk to a remote disk through the LAN or WAN. To enhance protection, enterprises can also increase the frequency of data synchronization, or configure remote sites as completely disaster recovery sites. Once the primary site has to be shut down for a period of time, you can start business operations on a remote site.
When selecting a product with the deduplication function, the customer should examine the capacity optimization algorithm, continuous data verification, data service level, and convenient and efficient disaster tolerance.