Top 10 problems of "deduplication" Technology

Source: Internet
Author: User
Tags dedicated server

1. What is deduplication?
 
Simply put, when data is transmitted or stored over the network, multiple identical copies are not transmitted or stored.
Data to reduce the occupation of network bandwidth and storage space. In fact, the previous SIS (single-instance storage) is a dedu technology, but its de-duplication unit is file. Currently popular
The deduplication technology is based on data blocks. The de-duplication effect will be better, and the implementation complexity will be higher. These technologies work best in the field of data backup, because multiple full backups
The generated data contains a large amount of duplicate data. Incremental Backup can reduce repeated backup to a certain extent, but it is measured in files and has poor granularity, and it is not practical to use Incremental backup for a long time because
It will be very complicated. If you solve this problem through merging backups, merging jobs will incur additional overhead.
 
2. How does de-duplication technology apply to backup or data replication?
 
Go
Heavy technology is mainly used to implement data backup and replication at low bandwidth. For example, branch data protection and narrowband disaster tolerance. The principle is basically the same. Before transferring a file, the fingerprint of the file will be calculated first, as shown in figure
If it is the same as a previously transmitted file, only the file attributes and pointers are transmitted, without the actual data. If the file fingerprint is different from the one previously transmitted, the file is split into smaller data segments.
The repeated segments only send pointers. It can be seen that the actual amount of data transmitted depends on the amount of data changes generated during the backup or replication interval.
 
3. What types of data does deduplication apply?
 
Deduplication applies to any type of data, such as office documents, databases, multimedia files, and virtual machines. Although some data is determined by its own characteristics, the de-duplication effect is not particularly obvious during the first backup, the advantages of de-duplication technology are apparent in subsequent backup. The more backups, the shorter the interval, the higher the deduplication ratio.
 
4. How can I know whether deduplication is effective for my data?
 
The de-duplication effect mainly depends on the following aspects: a. The less data changes, the more obvious the de-duplication effect. B. Can the data be effectively compressed, compression Technology is usually used together with deduplication technology
Data with a high reduction rate can significantly save bandwidth and storage even if the de-duplication rate is not high. C. The backup method you use (full backup, poor backup, incremental Backup), the most obvious for full backup, also effective for Incremental backup,
For example, for a 50 m file, only one k data block has changed. For Incremental backup, the entire 50 m file needs to be backed up. For deduplication, only the changed data blocks are backed up. D, data Retention
The longer the retention period, the more advantageous the deduplication technology is, because it can greatly save your storage space.
 
5. What are the benefits of de-duplication technology?
 
As mentioned above, it can save your storage space and network bandwidth. In this way, you can retain more backup data through high-speed disk storage, store more backup data to a limited disk space, reduce the use of tape, and save costs, it also improves the efficiency of data recovery. The advantage of bandwidth reduction can be used for data protection of branches and low-cost narrowband data disaster tolerance.
 
6. What is fixed-length block deduplication and variable-length block deduplication?
 
Quantity
Data changes are irregular. If a fixed-length data block is used, no matter what the data volume is or where the bit where the data changes is located, back up the entire data block. In this way, the chunks are large.
When the data volume is large, the data block hours and management information will increase significantly. Variable-length blocks can effectively solve the preceding problem. deduplication is better than fixed-length blocks. However, variable-length blocks also increase.
Complexity of data management.
 
7. How secure is de-duplication technology used to store and back up data? Will there be circumstances that cannot be recovered?
 
De-duplication technology is a mature technology
It is very safe. Ten identical files are stored with deduplication technology, and only one copy of the data is retained. However, the attributes of these ten files are saved separately, there are pointers pointing to the data blocks corresponding to them. Document used for de-duplication technology
Or the data block fingerprint (MD5, Sha, or CRC) to determine the repeatability, may generate a "Collision", that is, different files or data blocks calculate the same fingerprint, this results in data loss. However
The possibility is very small, and mature products will adopt a variety of fingerprint technologies to further reduce the possibility of "Collision.
 
8. What is forward and backward deduplication?
 
Forward
Deduplication occurs when a backup server is used to back up data on a computer. In this case, no duplicate data exists between the computer and the backup server.
Saves the bandwidth, but increases the burden on the computer to be protected. After deduplication, the data is transmitted to the backup server and then deleted and duplicated data can be stored on the disk or transmitted over the network.
This solution will not increase the burden on protected hosts. We usually use this solution for a larger site to submit the de-duplication task to a dedicated server in the site.
 
9. Does de-duplication technology support backing up to tape?
 
Magnetic
The tape does not support random access, so it is difficult to deduplicate the tape, and the efficiency is not high. Compared with the disk, the cost of the tape is also low. So the current de-duplication solution is mainly used for disk storage. If
When you use backup software to copy de-duplicated data on a disk to a tape, de-duplicated data is often restored to a non-de-duplicated state. This can also reduce the risk that deduplication brings to data availability to a certain extent (number of duplicates
Only one copy of the data is saved, which means that the data is damaged and a group of files cannot be used normally ).
 
10. How much does the de-duplication solution cost?
 
At present, there are many manufacturers that provide this solution, and the prices of related solutions may vary. In general, the investment in this part will soon be rewarded by saving network bandwidth and storage space. Therefore, this technology is currently the mainstream data protection technology and is quite popular among users. Especially for users with large data volumes.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.