We have previously described the benefits of primary storage de-duplication technology and the comparison of major vendor products. But any technology is a double-edged sword, users should be considered before implementation, careful deployment, take its long to avoid its short. This paper mainly introduces some shortcomings of the main storage data simplification.
The writer is David Vellante, co-founder and senior storage analyst at the Wikibon Analyst Forum.
There was a classic discussion about storage optimization and better utilization of storage capacity. Based on Wikibon user feedback, we see that many storage vendors have successfully marketed their offline/backup software de-duplication technology, which can significantly reduce the amount of backup data and reduce the ratio to 5-15:1.
Mainstream online storage streamlining technology
The De-duplication Technology (de-duplication) for backup differs from the compression technique that actually changes the amount of data by algorithm (using an algorithm to create a compute by-product and write a small number of bytes). With the duplicate data deletion technique, the data is not changed, but the copy data about 2-n times is deleted, and the indicator is inserted into a "main instance" of the data. A single instance can be treated as a duplicate data deletion.
Traditional data deduplication is often not suitable for online software or primary storage software, because the algorithms needed to replicate data removal inevitably increase response time, which leads to increased costs. For example, popular de-duplication solutions such as those from Data Domain, Protectier (DILIGENT/IBM), Falconstor, and Emc/avamar are not used to reduce the capacity of online storage.
There are three main ways to achieve online memory optimization, reduce capacity requirements, and increase overall storage efficiency. Although the industry typically uses terms such as duplicate data deletion (for example, for NetApp a-sis) and single-instance, it is common for Wikibon to refer to online data compression or primary storage compression from a broader perspective. These data deletion techniques refer to the following types of solutions:
NETAPP a-sis and EMC Celerra either adopt "Data de-duplication light" or single instance technology with embedded storage arrays;
The offline data streamlining scheme for host management, such as Ocarina Networks;
On-line data compression equipment from storwize;
Unlike some backup data reduction schemes, these three methods use lossless data compression algorithms, which means that byte reassembly can be done frequently from a mathematical standpoint.
These methods have their own advantages and disadvantages. The most significant advantage is the reduction in storage costs. However, each solution adds a new level of technology to the network, resulting in increased complexity and risk of the system.
1. Array-based data simplification technology
array-based Data reduction technology, for example, when data is written, a-sis runs online, reducing primary storage capacity. The duplicate Data deletion feature of WAFL (the file layout technology is written anywhere on NetApp) enables the recognition of a copy of the 4K data block at write time (creates a 32-bit weak digital signal for the 4K block, and then a byte-by-byte contrast to ensure that no hash conflict occurs) and A signature file that is placed in the metadata. This replica identification task is similar to snapshot technology and is done in the background with sufficient controller resources. This default is done every 24 hours, and the amount of data per change can be up to 20%.
The A-sis solution has three major drawbacks, including:
With a-sis, data deduplication can only be implemented in a single flex-volume rather than in a traditional data file, which means that the candidate blocks must be comparable blocks of data in the same data file. Data De-duplication is a fixed block of data based on a data volume of 4K, but not a block of data ibm/diligent arbitrary data can be implemented by this technique. This limits the potential for duplicate data deletion technologies.
When A-sis and other software-dependent snapshot technologies are used together, there are many limitations. Snapshots occur before data de-duplication, in which case the candidate blocks of the duplicate data deletion are qualified to preserve the integrity of the data. This restricts the potential for space savings. In particular, NetApp's de-duplication technology does not implement a space-efficient snapshot.
The cost of running the above data de-duplication means that a-sis will no longer be a high utilization (maximizing benefits) controller. This has resulted in an increase in the cost of metadata streamlining by nearly 6%.
To take full advantage of the functionality, users are locked in with NetApp storage.
IT managers should note that A-sis is not a fee-based standard for NetApp nearline components of Ontap (the company storage operating system).