Eliminate Four Misunderstandings of deduplication

Last Update:2013-12-15 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Reading: deduplicationThe emergence of technology has a certain source, so we should start from the beginning. Although the current price of storage media has plummeted, the Unit storage cost is already very low. But it still cannot keep up with the growth rate of enterprise data files. As a result, energy consumption, data backup management, and so on have become difficult issues. In addition, some duplicate files also increase. To this end, enterprises urgently need a technology to ensure that they store unique files on storage devices. In this context, the deduplication technology came into being. The purpose of the deduplication technology is to ensure that the stored files are not duplicated, thus reducing the data capacity. However, in actual work, there are still some misunderstandings about this technology due to various reasons. Eliminating these mistakes is critical to the correct use of the deduplication technology.

Misunderstanding 1: The time to process the deduplication technology in the future.

The deduplication technology can be divided into "online deduplication technology" and "post-processing deduplication technology" based on its implementation methods ". The two technologies have their own characteristics. However, because the "repeat data deletion technology" is a bit of a name, many users may misunderstand this. For example, some may mistakenly believe that the method for deleting repeat data in the future is to verify and delete all data after the backup process ends. If you think so, it is a big mistake.

In fact, the repeat data deletion technology is usually started after the virtual tape media is written to the backup data. That is to say, it will start after the virtual tape is fully written. Of course, there is a certain latency in the middle. For example, the storage administrator can set the latency based on different situations. The delay can be only a few minutes or several hours. The latency depends on the actual situation of the enterprise. For example, some administrators may place this job when the server is relatively idle. In this case, the delay setting is longer, such as waiting for work.

In general, data backup is managed in groups to improve data backup efficiency. The wait time starts when the first Backup Task transfers the backup data stream. When the first virtual Backup Tape is full or the first backup data is written to the end, there is no latency issue in the deduplication process. This is mainly because the system can continue to write the second group of backup data to the subsequent virtual tape media when the previous Group of backup data is written for deduplication. Simply put, duplicate data processing jobs and Backup Data Writing Jobs can run independently. This improves data processing efficiency.

Misunderstanding 2: The method of deleting duplicate data in later processing will reduce the overall backup efficiency.

From a technical point of view, this conclusion is true. On the one hand, repeated deletion will occupy server resources. On the other hand, there is a certain delay in the repeated deletion method. However, this is an isolated view. Based on the existing deduplication technology, you can use proper configuration to eliminate this negative impact.

In actual work, if the technical staff finds that the process of deduplication reduces the efficiency of data backup, the following ways can be used to eliminate this adverse effect. First, the deduplication technology can be allocated to multiple independent servers to share the pressure on the servers. Generally, when writing backup data to delete duplicate data, different processing engines often access the same disk array. However, the current technology allows them to access different regions of the same disk array. In other words, it can achieve high-speed concurrent processing. In this way, there will be no conflicts with the continuous Writing of the backup data stream, so as not to affect the efficiency of data backup. Second, you can adjust the data delay time as appropriate. For example, you can shorten the delay time or appropriately extend the delay time to avoid the peak time of data backup.

In short, the repeat data deletion technology will affect the overall efficiency of data backup to a certain extent. However, reasonable configuration can minimize the negative impact. At least, this negative effect is negligible compared with its advantages.

Misunderstanding 3: reducing the reading speed of the backup data stream is not conducive to data backup.

Technically speaking, adopting the deduplication technology under the same conditions will certainly reduce the reading speed of the backup data stream to a certain extent. However, the storage administrator needs to understand whether a technology is suitable and can not only look at one indicator, but should be evaluated in general. Simply put, you need to evaluate whether the overall backup time has been reduced.

It is also worth mentioning that, if deduplication technology is adopted, related devices are generally required to have high configuration or performance. From the perspective of Data Backup tasks, there are actually two parts: traditional data backup and deduplication. Although the two jobs can run independently, the running time is different. However, the entire backup job is truly completed only after the deduplication process is completed. Therefore, if the performance of the device for processing duplicate data deletion is poor, the system's deduplication ratio will be reduced. Therefore, when I deploy this project, I often evaluate and test the duplicate data deletion devices to see if they can meet the requirements.

In actual work, there are many cases to illustrate that although the deduplication technology will reduce the reading speed of the stored data stream to a certain extent, it can shorten the time spent on the entire backup job, meets RTO requirements. As the saying goes, it's good to look at the results rather than the process. This is the case for evaluating any technology. We should evaluate it from a whole, rather than a few individual indicators. Otherwise, the user may be mistaken.

Misunderstanding 4: duplicate data deletion technology and backup data stream Writing Jobs cannot run at the same time.

If only the same disk is operated when the backup data stream is written, this problem does exist. However, in actual work, this does not exist at all. In actual applications, deduplication is often used in conjunction with virtual storage. That is to say, the backup data stream is generally written into the multi-disk virtual tape media. The number of actually written tapes is often far greater than the total number of actually owned tape drivers.

The above is a possible misunderstanding of the deduplication technology. I hope that you will be able to face up to the deduplication technology after learning this article, so that the deduplication technology can serve users well.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Eliminate Four Misunderstandings of deduplication

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Eliminate Four Misunderstandings of deduplication

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support