Primary file System storage optimization (that is, to cram more data into the same space) continues to grow in popularity. The challenge here is that duplicate data deletion from primary storage is not without rules. You can't delete this duplicate data, and you can't delete the duplicate data, you have to recognize the impact of the removal of duplicate data on the performance of the device.
EMC has announced the ability to remove duplicate data on its own Celerra platform. NetApp has been using this feature for some time. Other vendors also add this functionality in a positive way by compressing and deleting data after data is not flowing. Then, companies such as Storwize have been providing this functionality in the form of online real-time compression.
As storage virtualization and streamlined configuration have proven, primary storage is better when you don't have to compromise. The problem with imposing some conditions on the primary storage is that things can become more complex. This complexity can lead people to not apply this technology. The more transparent and versatile the technology, the greater the chance of success.
The challenge with some primary storage optimizations is that it basically depends on the type of data you have and the amount of work you have to access that data. There are some benefits to deleting duplicate data, and it is clear that there must be duplicate data. Therefore, a full backup every week is the ideal application for removing duplicate data. On the other hand, primary storage is not all duplicate data.
In addition to primary storage, removing duplicate data is associated with heavy input/output tasks and random read/write input and output. In these cases, users may experience the performance impact of applying duplicate data deletions.
As a result, most manufacturers recommend limiting the application of this technology to home directories (directories) and VMware mirroring, where data duplication is highly likely and the workload is mainly read data.
In particular, do not use the ability to delete duplicate data in the database. There are concerns that there is a large amount of duplicate data in the database, and that removing duplicate data can have an impact on performance. As we noted in the Database storage optimization article, reducing Oracle database data, online, and implementing compression solutions may be more appropriate here. Database is the most suitable compression, regardless of whether there is duplication of data, real-time compression in most cases will not have a direct impact on performance.
As data growth continues to accelerate, more data optimization will be required. Using multiple technologies may be the only way to stop this trend. Compression may be widely used. As a complement to the deletion of duplicate data that should be applied to specific workloads, this method of deleting duplicate data should be applied to the archive and not used in primary storage. All of these require tools that can improve staff efficiency and resource efficiency.