Henry Newman, author of this article, is an industry consultant with 28 years of experience in high-performance computing services and storage.
I saw a piece of Egyptian obelisk with a history of nearly 4,000 years in the Central Garden of New York last fall, and the tablet was preserved quite well and clearly recognizable.
As we marveled at this archaeological marvel, my wife blurted out: "The tablet doesn't need to be backed up!" "
Luckily, apart from the Rosetta plan, no one has ever backed up the data to a stone yet. But my wife raises the important point that electronic data storage and preservation raises a lot of technical questions that the ancients had never considered. Imagine, say, a few thousand years later, what would it be like to try to read backup tapes, archive DVDs, or old Word documents after 10 years? There are problems with the format, migration, and data integrity of electronic data, and hard copies do not exist, but they also have their own preservation problems, which you can consult with archaeologists and document custodians.
In some ways, the Egyptians used simple methods that were much better than the way we recorded and saved information. You know this by contrasting the well-preserved obelisk and the 5-inch floppy, 8-track tape and vintage video that you think is well preserved but are now no longer available to read in that format. Can we use floppy disks, tapes and videos for 3,500 years?
After the rock, humans later wrote on animal Skins and papyrus, which were easy to write, but could not be kept for long. Paper and printing presses are faster, but wear faster. Have you seen the rules? The electronic records we use now can probably be kept for 10 years. Therefore, how to record and chronicle history is becoming an increasingly difficult task, because each generation of media must be moved to the next generation of media faster, or we may lose a lot of important records.
The media the man had chosen 10-15 years ago was paper. Previously, the price of digital storage devices was too expensive. Today, we keep almost all the information in digital form, family photos, music, movies, medical records, documents, e-mail and other personal communications, and so on. But the digital world we're building now has a number of important issues for the future, such as formatting, frameworks, interfaces, and digital integrity, and so on, which must be addressed in a standardized way so that we can better keep and deliver digital records. The preservation of history depends on them all.
Meta-Data framework
First, we need to establish a standardized framework for file metadata, backup, and archive information.
We need a framework in which metadata can be converted and saved between different systems. Some home file systems have several ways to add metadata, but they cannot be converted between the various operating systems. When you convert between different operating systems such as Apple, Microsoft and Linux, you can only get information based on POSIX. This is not enough to add meta data. What if there's a disaster? Can this information be converted to a backup device? Various forwarding protocols such as FTP, NFS, and CIFS cannot convert meta data between different systems. For Microsoft, most assistive devices are formatted in FAT file format, not NTFS format, and FAT format does not support some of the features of NTFS format in support of metadata. For an enterprise, each vendor provides either a copyright framework or a database in which to access the file system or manage storage space. Those frameworks need to be equipped with specialized applications to display and process file meta data. This solution is not convenient, and the cost of saving is usually high.
Storage drives and Interfaces
Not long ago, we were using a 5-inch floppy drive to back up the system, then the 3.5-inch floppy and CD-ROM, and now the main DVD drive, maybe this year we will see Blu-ray burning optical drive, in a few years there may be something new. Do Windows and Mac systems support these devices?
Over the same period, for businesses, we have er-90s, redwood, 9940A, 9940B, DLT and many other technologies. The only technology that can provide long-term support for enterprise applications seems to be the 3480 and 3490 tape drives used by mainframes. This also seems to be the case with the channels that connect these technologies. What about SCSI-FW, Fc-al, and even FC-2? These communications interfaces are closed, and even if they are still available, does the current operating system still have drives to support them? What if the drive has a vulnerability that needs fixing? IBM will address the problem specifically for mainframes, but will not consider a common, open system enterprise environment, because it is not only difficult, but also expensive.
Obviously, as technology progresses, you have to migrate your old data. Of course, there's no need to use rock to back up now. All you need to know is the language of the rock record, and we have been able to understand almost all forms of written communication.
Data integrity
Like poor language translation, modern data integrity is hard to guarantee because of the high cost. Some file systems and storage management frameworks such as ZFS and Hadoop may be able to verify data integrity, but these solutions seem too remote for ordinary home users. A candidate like Flash can either solve the problem or there are other problems. Although the density of disk drives has increased dramatically over the past 15 years, their hard error rates have largely not changed. This hard error rate means that both enterprise and consumer disk drives can fail and eventually result in data loss, and users spend a lot of time rebuilding the system. You can add hardware and reduce the incidence of similar failures, but these problems cannot be solved more fundamentally. You can spend a lot of investment to solve this problem, you can build a very high reliability archive file, but even the enterprise users, not everyone can afford the corresponding costs.
Obviously, even today, rocks still have a certain advantage. In the event of a device failure, the electronic data read in it requires professional knowledge, even professional knowledge, and may lose much of the data.
Data format
Has anyone tried using Word 2007 to open a Word document that was built 1990 years ago? We all know that the lifetime of all file formats is limited. Some formats, such as PDFs, may have a longer life span, and some formats may have a shorter lifespan, but none of the formats are infinite, and those formats can change quickly. We have no framework for changing and converting formats. Under Windows systems, you can identify file types by extension, but it may also be misleading. Under Mac OS, each file has its own metadata and cannot be converted to a Windows system, as is the case with UNIX systems. On the other hand, rocks only have the same language translation problems that we face today.
My wife is not working in the data storage industry, but she clearly knows that digital data management is more complex than previous information management. The concept, technology and standards of digital data management have not yet been shaped. I don't know if anyone will be able to solve these problems right now, but if the standard group doesn't solve the problem, it won't help us to manage our data for the long term. It's just a matter of time before a lot of data starts to get lost. Thousands of years from now, what will people think about our situation? If we want to leave obelisk for future generations, then we'd better start doing it now.