A probe into how the document is stored
Uncover the mysteries of cold storage
Per Brasher
The older members of the Internet community are pursuing longer-term solutions. The so-called service period is longer, which may mean MTBF(no fault running time) increased from 3,000,000 to 4,000,000, or 7 years of quality assurance. Regardless of the point of reference, there are some difficult problems involving the document class hard disk. The starting point of this paper is to find out some of the most important factors in cold storage system.
If you are in a hurry to reveal the answer, there is a section at the end of the which comes conclusion, in addition, there are a number of questions to suppliers and customers.
when I used to work on Facebook , I invented a word: cold storage (coldstorage). I also (without regret) invented the term "cold flash" (coldflash) --which is left to be discussed later ...
The intent to invent cold storage at the time was to differentiate between applications such as "Documentation" and "Backup". Both of these applications mean that there is a huge delay when data is taken and that it is very heavy to digest and absorb. The application features of cold storage include extremely low operating costs and extremely high data durability, enabling the ability to recover data at virtually lossless speeds or provide direct online data access when needed.
1. Service period
There are many problems with how the service period is calculated in a super-large data center. A common method of calculation is the 60% of the warranty period as a "bathtub effect" in effect, at this time, the value of the hard disk in the gray market is the initial value of the 50%. Some companies will use the equipment to the end of the warranty period, then replace the whole. There is no company willing to use the equipment beyond the warranty period until the end.
we all know fromMTBF(mean time between failures) to deriveAFRthe equation for the annual failure rate is not absolutely accurate, and the method used by the customer to deduce the real-world "failure rate" from the device's "perfect experimental data" is small and different. According to the published data, the actual result and the result of the equation can be multiplied from5Times toTentimes. Let's take a look at some basic facts. For example, the program assumes that the failure rate is constant--which of course is absolutely impossible. But, according to this hypothesis, a piece markedMTBFto be2Mhours of hard drive to figure outAFRis a. 44%. Experimental evidence from the room temperature data center shows that a group of mixed, aging (not yet expired) hard drivesAFRto be4.4%. In this view, we can draw from the equationAFRvalue multiplied byTen, the accuracy of the numerical results is satisfactory. Therefore,800KhoursMTBFthe hard drive is calculated according to the equationAFRto be1.1%, and according to the adjustment as above,AFRis the11%. This adjusted failure rate is intolerable high!
Another hard-to -determine question is what is affecting the life of the hard drive? It is clear that the physical characteristics, the number of bytes transferred, the load/Uninstall cycle, start/stop cycles are tested and well-known factors. However, a few new challenges have been added to the room temperature data center. For example, each hour of hard diskdeg Tand% RH. The measurement method is in front of the hard drive9To read a value every hour in an inch and thenDeltaThe value is escalated. Therefore, a room temperature data center can be fully activated vaporization cooling-this cooling method will causeRHValues andTemp, but maintain this new height for the remainder of the hour, thus maintaining the range specified in the specification. In other words, the description does not specify the slope of any curve change but simply implies that it is inside. Therefore, if the necessary information is missing, it is very likely that in this long period of time unknowingly to make a violation of the operation.
So, is the change in temperature and humidity not an important factor? (Keep in mind that the discussion is for the storage of document classes) a series of research papers have been published on this issue, especially in cold storage environments. It seems that parts that are unable to maintain their own temperature stability (and thus resist moisture) are inevitably confronted with the difficulties of larger systems. If the system allows for rapid changes in the "air conditioning", then the possible impact is enormous. In order to clarify the true meaning of this discussion, there is evidence that in the room temperature data center, empty or infrequently used hard disk than the common hard disk error rate is higher. Therefore, the direct answer to the question of the beginning of the paragraph is yes.
2. uncorrectable bit error rate UBER, uncorrectable bit error rate.
This first paragraph is my personal complaint. If you are wrong about our industry as a whole, please skip to the next paragraph and read the discussion about the data. In hard disks, uncorrectable errors occur on all types of hard drives. But the hard drive consumes amazing even up to3,000 Mstime to avoid a unsuccessful read and write to be known!! What's going on here? This approach is problematic. The problem is with the OS-driven designer. Over a long period of time, storage has beenITnot be given enough attention in the And now, as we move into the new era of "data," the operating system's designers must awaken to the fact that the underlying digital system is actually a simulated fuzzy device, and then the lost blocks, unreadable partitions, etc. are handled properly. Erasure code in the macro-level effect is good,LECThe secondary level also has potential, but in any case it is not possible to determine whether the data being taken is exactly what is being stored in it. (ImplementingT10standard-definedDIF(Data Integrity Field) is an effective and weak first step. )
UBERaffect the operation and maintenance. Even ignoring the fact that the operating system is unable to fully grasp the actual situation caused by the hard disk is incorrectly labeled wrong, the hard disk's uncorrectable bit error rate has a huge impact. The incidence of "soft" errors is still within a reasonable range, such as110-15. This data means that each transmission thebyte data, an error message is issued during normal operation. The result of the massive data center is the loss of storage capacity in exchange for network/CPU. If there is a non-local protectedN-1Erasure code System, the problem is simplified, each transmission of 125 bytes of data, the result is at least one segment error. This means that in a[10,14]in the system of erasure code, we must find 10 valid sections, recalculate 4-bit check code, write a new stripe of 14 large, and then set the original (remaining) section to garbage collection mode.
Therefore, if youto do a simple readingTenOperations that the system is actually doing include: 10+cpu+14+13,thatPanax NotoginsengTimesIOoperation in additionCPUand generates large amounts of network traffic at the same time. Therefore, the system to theUBERThe requirements of the indicator loosen an order of magnitude(from thereduced to)The cost effect on the overall system is1037times. In other words, each transmission13PBthe data will causePanax Notoginsengtimes as much as thisI/Oand the load overhead of the network system.
3. Usage Mode
The purpose of a cold storage system is to prevent the data from being completely lost. The "User's ignorance" (pushing the Bug's new version code online into the business system that caused the disruption of all online resources) is the source of the data loss we are prepared to guard against. In this case, to achieve the continuity of business in any sense, the system must have some degree of fairness. Another key objective is to reduce costs as much as possible, starting with two aspects of equipment design to operating costs. Due to the extreme cost reduction in device design, it may be more significant to reduce expenses from an operational point of view under current constraints.
Cold storage systems have very pleasant and creative names, such as Pelican, Subzero and so on. But these systems have adopted some kind of fair dispatching mechanism. the equilibrium mechanism of heat balance and energy consumption is adopted in Pelican system, while the Subzero system uses time-splitting mechanism. In both systems, any data collection that needs to be recovered spans a number of balanced areas, so a staging area is needed to clean up the data before returning the data to the business system. This is the so-called "cold Flash" of the first instance, this concept in each industry is somewhat distorted, I would like to apologize to you first.
cold storage may vary. Sometimes, the Law Society requests that its records be kept to the end of the case; the company's data will need to be kept for seven years; social and medical records need to be retained up to 99 years long. But basically not shorter than the five-year period. Therefore, because blueray blueray DVD technology again like the chicken blood to regain vitality, become the cold storage market competition technology one.
some ultra-large data centers have begun to use tapes because of the low cost of media and maintenance of tapes. However, tapes are not ideal for the ability to provide fast retrieval recovery. The reality is, if there is a library, in which the tape is stacked, never take the tape out (in order to meet the need for rapid recovery), the average total cost after one year is basically equivalent to tape! Tapes also face interesting challenges in the room temperature data center. When the temperature rises, the tape will stretch, and when the environment is wet, it will stick together. Also, tape is so low in energy consumption that it cannot even manage its own temperature. Therefore, the data center must be a separate low-energy-efficient warehouse to store tapes ... It's really not cool.
4. Comprehensive consideration of all factors in a large-scale deployment
Well, now to the summary section. First, a few variables are determined first. First from a standard with 7MW power supply, land footprint of about 20,000 square feet, about 1,000 racks of data centers. In order to manage I/O and unavoidable EC recovery needs, reasonable calculation and storage ratio can be: 4% for calculation, the rest for storage. Provides a total of about 260,000 hard drive slots. If we adjust the fixed values calculated by the above equation to the reality, the result is that a hard disk with an MTBF of 2M hours has an error rate of 5%, while a hard disk with an MTBF of 800K hours has an error rate of 11%. This number is multiplied by the total number of disks, the result is every year , the MTBF of 2M hard disk has 12,975 error, and the MTBF of 800K HDD 28,545 will error! As we all know, the effect of the "bathtub effect" is higher than the above value, the first annual meeting is about three times times that, the last year of the warranty is similar. The warranty period for the 800KMTBF HDD is three years, so in a single life cycle, with the fixed error rate above, there will be 85,635 hard drives to be replaced, and if the model is adjusted according to the bathtub effect, this number becomes 199,815 (in other words, 77% of the total number of drives). Is still a frustrating situation ah!
now returns to data integrity and lifetime issues. If you follow the yearly 28,545 block hard disk error, and then add a layer of erasure code on these hard drives, as mentioned earlier, a vulnerability in the data may cause the network/disk subsystem on the i/o up to 37 times times, the annual cumulative number will be more than millions of total data handling (read the remaining bits on the network, calculate a new check code, write a new stripe on the network, the old stripe garbage collection and so on). 1TB on the hard drive, there will be more than one AI byte of data movement every year. If the system is using a large capacity of the hard disk, soon, the total amount of mobile data will exceed the system can assume the maximum load. The security of data has undoubtedly become a major hidden danger.
look at the problem from another angle. yttibrium has been advising suppliers of such systems and their users to pay attention to a design point, that is, an MTBF of 4M hours and a hard drive with a lifetime of 7 years. We have also designed the Erasure Code auxiliary unit in hardware, so that 37 times times more problems can be solved within each storage node, rather than on the network between nodes. With the help of Yttibrium's design essentials, the error rate is adjusted to 2.1% in the hypothetical system environment described above (from the lab to the real-world adjustment index or 10 times times the same), which is 5,450 hard drives in the overall 260,000. This eliminates the need to migrate data at the end of the third or fifth-year life cycle of the hard disk. At this point, it also happens to the SEC's retention period, so that the mission of the hard disk and the data ends-except for social or medical data. In addition to these advantages, there is a need to save data for more than 7 years, the migration of data can be done transparently in the background with the LEC algorithm (as long as the storage node in the idle between two hard disk replacement can take time to complete the task of data Migration).
Several requests
Vendor: Please provide the industry with an in-chassis lec optimized, with very high Span style= "COLOR: #3e3e3e" >rv endurance HDD, So that we can further reduce the cost of the chassis; a MTBF 4m Span style= "COLOR: #3e3e3e" > hours, 7-year hard drive, price cost to get our tco (total cost of ownership) earnings were fully realized and surplus in the third year.
User: Please publish more data about the hard drive, reliability, and operating environment. Learn from BackBlaze and make the exact needs and conditions of use more public. It is not easy to take a hard drive manufacturer to extend the life of the product and reduce the error rate. Your data will be of great help.
A probe into how the document is stored