Test: Data domain to remove duplicates

Source: Internet
Author: User
Tags file system backup

The De-duplication technology not only improves enterprise storage efficiency, but also reduces the need for storage devices such as tapes or disks, thereby further saving data center space, energy, and cooling resources.

In a broad sense, duplicate data deletion is a technology that analyzes data files, finds and deletes redundant blocks of information, and uses compression algorithms such as G-zip or LZ. In general, files that are often edited but rarely changed are ideal for duplicate data deletion. As a result, many companies are considering a de-duplication solution to reduce the storage space required for enterprise databases, e-mail, server information, and backup and archive of virtual machine mirroring.

Data domain is one of the leaders in the data deduplication market, so let's take a look at the manufacturer's products first. Other major vendors include NetApp, IBM, EMC, and Quentin. In general, the overview of the product is focused on the level of data de-duplication or the percentage of original disk space saved by data de-duplication. Not only are important metrics such as throughput performance and ease of installation, but space savings are difficult to measure in a lab environment (for example, many users do not simultaneously make frequent and subtle changes to real-time data for months or years).

And this time we want to consider data de-duplication from a different perspective. We choose to focus on the simplicity of the application and possible problems, throughput performance, manageability, and functional characteristics. We will test at the storage lab in New York and then interview users in data domain to understand their actual application and get a more accurate picture of the actual data de-duplication rate. Our primary objective is to assess the stability of the data domain solution for multi-site business continuity.

Our tests simulate a company that has a data center, regional headquarters, and branch offices. The branch office backs up 350GB of internal storage to a local DD120, and regional headquarters backs up 1.2TB of internal storage to a DD510, and departments back up the data center's two external drives that encapsulate 10TB storage into a DD690. Each device is designed with the maximum redundancy of power, NICs, Fibre Channel controllers, and RAID 6 disk arrays. We used two methods, the first is to use the Symantec Veritas netbackup software for local backup, and then use data domain replication technology in different data domain device replication between, the second method is to use data Domain OST to control all backup and replication processes for NetBackup. We find an interesting phenomenon that if your business is already using NBU, you can keep all the stale work and rules just by migrating them directly from the tape drive to the data domain drive.

Although the configuration process is not simple, some aspects are more focused on enterprise storage technology rather than comprehensive it technology. Use the CLI for remote login or additional KVM to complete the installation. I found that the default password must be changed at the first landing. We have licenses for storage devices, replication, and OST, and then we have the infrastructure network, file system, system, and administrative settings. Verify that the settings are complete and reboot the system before installing CIFS and NFS shares.

The CLI can help you end commands, Run command trees, and provide help. However, it is still just the CLI, and I prefer to choose a better web GUI. But I think that's one of the main drawbacks of data domain--although the GUI can do the task, it's not perfect. I can monitor the operation of all three parts under one screen, but I must use the CLI in the process of real administration. The owner of Data Domain said that most of their users were using the CLI (which was also confirmed during our investigation), and next they will study the application of upgrading to the Web GUI.

The careful and informative organization of documents is far from reducing the potential problems of adding new technologies to the data center. For example, we upgraded the DD510 with an extension suite and configured 6 250GB additional drives in 10 minutes, which consists of a 8-disk RAID group, a raid group of 6 disks, and a hot backup disk that can be used by each group.

Our lab tests found that the de-duplication rate ranged from 5 times to 99 times times, depending on the file type and the number of times the same content was backed up. In general, after you complete your first backup, you will not be able to release too much space for the reason of the compression, and then you can save more space in the next backup. The configuration of many businesses in terms of backup, archiving, and business continuity processes is not much different from ours. The cost and time that can be saved by efficient data de-duplication before replicating over a WAN connection is staggering.

After completing the lab tests, we started a visit to data domain to understand the de-duplication rate in the actual application process. We interviewed a company called the Rockefeller Group, a company that provides commercial real estate, real estate services and telecommunications services. Sanja Kaljanac, the company's senior IT service engineer, said that their data centers could be up to 100 times times the data deletion rate on DD565 and 67.5 times times the data deletion rate on the DD120 of the branch offices. After analyzing the log files provided by other data domain users, we found that the compression rate ranged from 10 times to 40 times times, and the maximum throughput on the DD690 was between 300~500MB per second. In addition to Rockefeller Group, other property companies that use data domain products include land America Financial Group and Skidmore and Owings and Merrill.

Our lab test results and actual survey results show that data domain De-duplication technology has some advantages in backup, recovery, and archiving between sites or through WAN connections. Considering the amount of data needed to maintain the business continuity of multi-site enterprises, traditional backup methods have been perfected or even broken through the original limitations. DD120 application in branch offices combined with DD690 or DD510 in data center applications not only eliminates limitations, but also allows you to reassess existing business continuity processes.

L Test Product Total Price: USD 293540

L DD690 (basic structure with expansion frame): USD 210000

L dd510:19000 USD

L DD510 Expansion Kit: USD 13000

L DD120 (with copy function): USD 12500

L DD690 replication software license: USD 35000

L DD510 replication software license: USD 2540

L DD510 replication software license: USD 1500

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.