CIO, do you know what a "data archive" is?

Source: Internet
Author: User

what is archiving.

When should you apply it? What is the best way to implement it. These are the problems that CIOs are facing right now, and these are the issues that this article is going to look at.

  what is archiving.

Data archiving is the storage of inactive data on level two storage devices, such as online disks. These data are information that may be needed in the future and therefore cannot be deleted. There may be legitimate reasons to keep the data, or it may be institutional needs, such as market research or legitimate reasons that may be as unpleasant as deleting it. In any case, the final decision is to store rather than delete it.

The problem of saving data is either for legitimate reasons, or "just in case", which means it must be kept somewhere. Without a valid archiving policy, this means that the data will be kept in the same place as anything else-primary storage.

For expensive resources, storing inactive data on primary storage is a huge waste. Between primary storage, at least 5 to 10 dollars, or even the most expensive form of archive storage between Gbdelta, that's very large. In addition, primary storage is designed to quickly transfer over process data. It basically does not support data recovery regulations or years after it has been written to storage, and cannot verify the integrity of this series of data. These are important requirements for archived storage.

  Archive Target

The first step in archiving is to select a storage platform for the repository. This element must be the first step, because the selected platform determines how the policy is implemented.

Traditional tapes are considered an ideal archive medium. Because it is cheap and easy to store and transmit. The problem with tape is that it requires special software access. It's not like copying to another driver on the network. The ability to data retrieval and virtual data validation is limited.

As the price of disk declines, inexpensive NAS and SATA-driven technologies are beginning to surface. They are easier to access than tape, but they also have other challenges, mainly cost and scalability.

The disadvantages of disk and tape lead to the development of disk-based archiving systems, such as those derived from Permabit technology. These systems provide access based on NAS storage, are more cost effective, and tape scalability includes both retrieval and validation capabilities.

  Archiving Policy

Once the archive repository is selected, you can begin to develop a policy program. The first step is to decide how the data is migrated to the platform, how long the second step is to migrate, and, finally, how to secure the archive.

How you migrate data is often a deliberate one. The easiest thing to do is migrate the data to the filing system through standard OS commands. This is true if the archiving platform is a disk-based archive. Because these systems are simple other network drivers, it is easy to move the data to the repository manually. For some automation, a tool like Tek-tool can be used to generate a list and insert the list into the OS script that moves the data.

The advantage of a manual migration program is that it is cost effective, usually free, and can be implemented quickly. The disadvantage is that it has to be manually operated, maintained, and the user has no goal guidance, such as where the file data has been moved. When a more standardized data migration program is developed, it is no longer an ideal strategy.

This more canonical program is usually the form of some kind of automated data migration. These can be done through a specific archive software from Atempo or Enigmasoftware company. These software usually deploy an agent or remote access to the server in your environment to determine the appropriate archive file. These files are then migrated for archiving. Most applications create a chain of transparency so that users can accurately retrieve the archived data.

Archiving inactive data in months or weeks allows for very challenging migration policies based on a combination of software and disk archiving. It gives primary storage the best utilization without compromising the user experience. When a user accesses an archive, it happens to be the point at which the user leaves the moment, because it is a disk archive and is usually not noticed as a decrease in performance. Most survey reports show real activity data, and Windows-annotated data in 90 days only grows at a rate of 3% to 5% a year, and a new archive can postpone storage purchases for years to come.

The last part of the storage policy is the protection of the archive itself. Many users try to back up the disk archive like other backup devices. This is not true and the archive never needs to be backed up.

For the protection of a local disk failure, the disk archiving system has advanced data protection scenarios that provide stronger protection than standard raid. They also have the complete built-in data for their own checks. For the protection of site failures, disk archiving solution applications can be replicated to another site over a WAN connection. These require the purchase of a level two system, and the cost savings achieved by implementing a disk-based archive will be more than not setting this additional protection.

Without a level two system, archived stored data is backed up multiple times with a full backup. For example, if you migrate inactive data after 90 days, and your backup rule is a full weekly backup, this means that archived data will be protected in approximately 12 full backups. A simple change to the rule is a backup of the original one-month backup once to one months means that the archived data is also available on tape.

  Impact of archiving

The fastest and most obvious benefit of a disk-based archiving strategy is that it can reduce the primary storage requirements this year or in the following years, potentially saving the storage-purchase budget in the IT budget. In many cases, users can release up to 80% of their primary storage capacity, and by reallocating storage, users can actually reduce their savings and thus reduce their energy consumption.

Finally, an effective archiving strategy can delay upgrades to backup investments by reducing the backup load by 80%. These reductions can be achieved by upgrading to backup to disk architectures, backup bandwidth, and backup servers.

In 2009, it budget trends and the efficiency of archiving make archiving an ideal choice. The fact that you can accomplish this project is to improve primary storage performance, reduce backup windows, increase data security, and make it a worthwhile project in today's economy.


The explosive growth of data volumes has made us more concerned about storage. This also caused the current talk about "data mining", "Knowledge management" and other topics more likely to resonate. Now the storage, is not a simple "save" and "storage" meaning, not to put the data in the corner of the fine. More importantly, we have to use these data to further generate value, enhance business capabilities, increase efficiency. At this point, "storage" evolves into more other meanings. such as backup, data archiving, data protection, data mining, and so on. So which of these alternative nouns of "storage" is currently the industry's most talked about? Data archiving "is one of the first. Why do you say that? There are plenty of reasons.

Backup and data archiving are dispersed and unified

How does the Storage Network Industry Association (SNIA) interpret data archiving? SNIa's interpretation in the networked storage bilingual dictionary is that--archive (data archiving) is a consistent copy of the data set, typically for long-term persistence in saving transactions or applying state records. Typically, data archiving is often used for auditing and analysis purposes, rather than for application recovery purposes.

The above explanation is too terminology, not good understanding, then we can use backup and data archiving for comparison, because backup is a lot of people easy to understand nouns. Backup and data archiving are all applications of data storage, but they are used for different purposes.

Take a look at the backup, which is actually replicating the data to ensure that the replicated data is restored when a data loss or system disaster occurs. As a result, backups are focused on changes and updates to business information, short-term storage behavior, and are often overwritten. For example, the bank backs up transactions every day.

It's a good idea to look at data archiving based on a backup explanation. Data archiving is an application of "massive data" and a planned migration of data. When data is stopped or not being used frequently, transferring them to other places via data archiving, freeing up primary storage and keeping it out of the daily Backup window saves space and improves backup efficiency.

So according to the above explanation, if simplified, it is the difference between "CTRL + C" and "Ctrl+x", Backup is copy, data archive is cut. Of course, this is only to facilitate the understanding of the metaphor, in fact, there are still a lot of mystery.

Backup and data archiving are different but interconnected. As long as data archiving involves the need for data backup, and both are operating on storage devices, they can be fully implemented based on the same technology system. So we look at the current mainstream storage vendors are "Data management software", can be backed up, but also data archiving, there is a "divided and the" meaning. At present, backup and data archiving are often considered together, collectively known as Bura (Backup, Restore, Archive). The similarity between the two is the use of replication to protect important data from being corrupted or lost. The common Bura solution is D2D2T, which is backup disk to disk and then to Tape. This satisfies the requirement of the backup speed, also satisfies the requirement of the data archiving for the long storage time, and balances all the requirements.

The software that the manufacturer launches is so, then the enterprise to the data archiving and backup's demand also inseparable. For enterprises, backup and data archiving has two different but complementary features: backups are used for rapid replication and recovery to reduce the impact of failures, personnel errors, or disasters; Data archiving is used for effective management, retention, and long-term access and retrieval of data. Organizations can combine data archiving and backup to optimize costs, improve the overall effectiveness of the storage infrastructure, and enable backups to become more efficient with effective data archiving solutions, and data archiving can leverage the backup infrastructure to meet data protection needs.

Demand soaring capacity around application

With the explosive growth of data volumes, the demand for data archiving has increased significantly. "The market is driven by strong demand for data archiving and data protection and recovery software," said Michael Margossian, a storage software analyst at IDC, when it released the third quarter of 2007 global storage software market revenue. The need for backup software seems to have cooled and demand for data-archiving software is rising. ”

So for the enterprise, in the current market competition environment, the need to increase data analysis capabilities to enhance competitiveness not with the "Data archive." For example, we inquire about the monthly telephone charges, currently can only be found within 6 months of the cost. And what about a year ago? Not the telecommunications department has removed this data, except that it has not been shown to the user. These older data are "archived", but can be viewed by the Telecommunications department at any time. When the telecoms department needs to analyze the charges and then launch a new business, these "old" data are pulled from the data archive. A domestic manufacturing enterprise CIO's feelings are very "straightforward." "Making the data money is more valuable than making the data cheaper," he says. "This also shows that data archiving is easier to help companies improve their competitiveness than backup.

It can be said that data archiving is the implementation of the backup program after the enterprise to further comb the data. Why is the sublimation on the basis of backup? Because there is no backup data, who dares to "cut". So backup is a prerequisite and data archiving is a promotion. And data archiving also requires more than the prerequisites for backup. The first is the capacity requirements, the general level is in the TB class or even PB. For businesses that do not have terabytes of storage, instead of using data archiving, it is better to directly increase the size of the disk. For example, the Foshan Igor is committed to the global market to provide power transformers, power transformers and Transformer iron core components of professional suppliers, its IT department head Eujianwen in an interview with reporters, said: "In terms of capacity, Foshan Igor Mail database 120GB, engineering data Files 70GB, General application System SQL database 120GB, Oracle database and application has 170GB, a total of 480GB of data volume. Because of the small amount of data, we do not have the need for data archiving technology, only the use of backup capabilities. ”

But it does not mean that companies that do not have the technology to apply data archiving now don't care. In 2007, for example, SNIa completed a full-scale survey involving hundreds of people from various countries around the world. Surprisingly, 80% said their message must be kept for more than 50 years, and 68% said their data must be kept for more than 100 years. This saves the data, and the capacity growth is to be thought of. So, Foshan Igor has also seen this trend of data explosion, in the next 3 years to develop it construction planning, "data archiving" has been leaps. Eujianwen told reporters: "The current data archiving technology has been ranked in the next 3 years in Foshan, the future of the application of the calendar, the future will first in the ERP, financial information and mail applications of data archiving." ”

ILM solves data archiving challenges

While the need for data archiving is rising, there are many challenges ahead. The most important of these are two difficulties: long-term data retention and compliance. Data is held much longer than the storage system (disk or tape) and application life. For long-term data archiving (over 15 years), the biggest challenge is the logical migration. The logical migration maintains the application's specific nature, and automating the main processes becomes more difficult. Full "save" needs to preserve the readability and interpretation of the data.

In terms of compliance, as more and more business operations are recorded and stored digitally, there are more and more laws and regulations to manage business and data, and the consequences of failure to comply with these regulations are becoming more and more serious. In addition to complying with government regulatory regulations, organizations need to develop their own internal policies and procedures to mitigate risk and control IT. Layers of compliance Add to the difficulty of archiving data.

Backups and data archiving on disk or tape are currently required to migrate data every 3-5 years (both physically and logically). Physical migration requires moving information from one physical storage system to another, or moving from one media format to another to maintain physical readability, accessibility, and integrity. Logical migrations need to move information from one logical format to another, such as moving from an older version of an application to a new version, to remain readable and explanatory. Therefore, for the above mentioned SNIa survey concluded that "most people want to keep the data for 50 years or even 100 years," then stored in the tape, then face the difficulty of reading and save time is not long such a problem. In short, we need to keep information much longer than the typical lifecycle of storage systems (disk or tape) and applications. Even if the retention period has not yet arrived, the physical media begins to degenerate and become unreadable.

So is there any way to keep data for a long time, to increase capacity, and to quickly read data? SNIa long-term data archiving and compliance storage Plan (LTACSI) Chairman, NetApp Global practice director Gary Zasman's recommendation is to implement a formal lifecycle management process for applications, operations, and data repositories to address the efficiency of data management in service life.

Currently, the idea of applying ILM (Information lifecycle Management, Information Lifecycle Management) in a data archive has been gradually embraced by users, not only to help companies improve their data asset management overall, but to achieve a significant amount of data at the lowest cost Management and efficient use. And with the application of mature and realistic demand, data encryption, identity authentication, virtualization and other technologies have gradually entered the application of data archiving, and effectively improve the data archiving application efficiency, enhance the data security, greatly reducing the complexity and cost of operations.

Where data archiving technology goes

A good storage data archiving system helps organizations achieve the goal of reducing the cost of retaining historical data, accessing and utilizing historical data more efficiently and quickly, reducing the cost of human management needed to protect and maintain information, and securing data archive data. But for data to be read, data archiving is still not perfect. For example, we now pick up a book 100 years ago that can be easily read, and a few years ago backup tapes may be much more difficult to read. Even with the right hardware to read the tape (and the tape itself is still intact), we need to know the writing format of the tape, and we need an application that recognizes this data.

In the early days of IT applications, the disk array was not as developed as it is today, and the tape library was the primary backup technology. Today, the hard drive is getting cheaper, not only has the tape been used less in backup, but it has even been proposed to use a disk array for data archiving to replace the tape library completely. However, as the technical characteristics of the disk array are determined, the data stored in it is in hot condition. This means that the disk storage system used for data archiving should not be shut down, and the process of power-back is complex. And in advocating "green computing" today, the long-term boot is not conducive to the reduction of energy consumption. Moreover, for decades, hundreds of years of information, the use of frequency is not necessarily very high, disk and tape, compared to cost-effective more disparity. So, for data archiving, tape libraries are still the best choice to be irreplaceable.

For faster and easier reading of data, the ideal solution for data archiving should be VTL (Virtual tape library). VTL usually has several important advantages, and, like other d2d solutions, they are inherently more reliable than tape and do not have media errors, mechanical failures, or downtime issues. Virtual tape drives and media are not worn if they are used stably; they do not need to be cleaned and maintained. Most importantly, enterprise VTL can improve the performance of data archiving by one order of magnitude compared to physical tape libraries. However, although the VTL performance is superior, but also has the advantage of convenient management, but its cost is still an insurmountable threshold, which allows a lot of data archiving needs users have to look at VTL sigh.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.