Discussion on design idea of data-oriented reliability storage System

Source: Internet
Author: User

Storage System design threshold is relatively high, and the largest difference in the existence of computing systems is the storage system is the bearer of data, once the system fails, not only the continuity of the business is not guaranteed, more importantly, the user data will be lost. Compute node failures, resulting in business continuity outages, are the biggest differences in reliability requirements compared to storage systems.

just more than 10 years ago, when the storage system was developed, there was no sense of how complex the storage was, not just storing the data in the disk in accordance with certain rules, and implementing certain functions, such as data protection RAID, data replication Replication, data snapshot Snapshot, and file system. The most complex of sensory storage systems is a variety of functions, the design focus is on the development of these storage software functions. In that period of research and development experience, the biggest problem that bothers us is that after a long period of testing, there is a small probability of the system Panic problem; There is a data consistency problem that should not occur after the abnormal power outage, it is difficult to locate after the problem, and it is necessary to reproduce the problems. In short, complex and large storage software stack is difficult to converge, and even affect the promotion of products.

After undergoing the baptism of a major international store company, I found that the storage software developed previously was even more advanced and cool than the first-line vendors. The style of code and the level of software, flexibility is no more than the first-line vendor software to the poor. However, the storage products of the first-tier vendors are particularly stable, especially if there are problems, the problem can be quickly convergent in many ways. Many people may think that large companies in the process construction, project management, research and development management and other aspects of a more sound, more professional, can better guarantee the quality of products. Through the long-term summary, I realized that things are not as simple as we think, good process construction, project management and research and development management is based on an excellent team. This excellent team needs to have deep bone marrow-oriented ideas for data reliability design when designing storage systems. Rather than simply having the ability to function-oriented design. Data reliability-oriented design requires architects to have higher, more systematic global capabilities than a single function and subsystem. For the storage system, the data reliability is the bottom line, in the whole system design process if touch the bottom line, then the good design will be buried in the hidden danger, will be the system, damage to the customer.

The design of the storage system is here, if you want to achieve the design of data reliability, on the basis of the bottom line to achieve ultra-high performance, high system scalability, complete functionality, low prices, it is true for designers to put forward a great challenge.

when it comes to data reliability-oriented storage system design principles, I would like to give an example to discuss it in detail. Everyone is very familiar with data protection subsystem RAID, and many people will only consider the software implementation when choosing RAID? Or the use of hardware to achieve this level of problems. There are many discussions on soft raid and hard raid . In data-backup-oriented deduplication systems, many designers will focus on the data deduplication algorithm, which is a very important feature of the software, for the storage of the back-end of the disk, usually using high-performance RAID6 to solve. And the evaluation metrics for back-end RAID are high bandwidth or good IOPS performance. Many backup-to-weight devices on the market are such design ideas. If you adopt a design approach to data reliability, this product design is imperfect and is not the best backup storage product. It is known that data backup is often the last part of data reliability, if the backup system is a problem, then the user's data from where to recover? In the RAID6 configuration of the backup system, if there is a three-disk simultaneous failure situation, the user's data how to do? This is the problem that needs to be considered for data reliability design.

EMC 's renowned go-to-back product Datadomain did not use third-party RAID cards at design time , or even a third-party disk array. The reason is simple,RAID cards or third-party disk array in the case of multiple disk failure, there is no way to provide excellent data recovery support, and backup system to the data reliability is higher than the application and primary storage business of the disk array or RAID card. Therefore,Datadomain chooses to develop the data protection subsystem Dd-raid for the backup domain in the technical decision-making . If from the functional point of view,Dd-raid and ordinary RAID is not fundamentally different, but with a higher level of data protection mechanism, in the case of multiple disk failure, you can also recover and salvage data through additional meta-data information. In the aspect of product promotion , Datadomain has summed up a series of such technologies into DIA, introducing additional data verification and protection mechanisms at all levels to ensure data reliability.

650) this.width=650; "src=" http://s4.51cto.com/wyfs02/M01/8B/1E/wKioL1hFKWHANDtnAACn3aPNBjg811.jpg "title=" 1.jpg " alt= "Wkiol1hfkwhandtnaacn3apnbjg811.jpg"/>


of course, Dd-raid performance is not the most perfect, especially the random write performance is very general. However, he has the highest reliability and can recover data in extreme cases. This is the choice of technology under the guidance of the principle of data reliability design, which is a challenging place for storage system design. Without a global view, there is no deep understanding of storage systems, applications, and technical details, and it is difficult to make judgments.

If we have the idea of data reliability design, we can choose the storage System software stack architecture easily. Storage software is the core of the storage system, and the software stack architecture determines the success or failure of the storage system. A good storage software architecture can be long-term continuous evolution, function enhancement, and ensure the reliability of the data, encounter problems can quickly converge, for the company to precipitate core value; a poor software architecture will eventually kidnap the entire product, making the system non-convergent, data reliability challenges, customer problem resolution to adjust the architecture , the company can not get long-term technical precipitation. In fact, the concept of data-oriented reliability design is not only in the storage software research and development, but into all aspects of the development of storage products, including the choice of software architecture, code writing, algorithm selection, definition of function, definition of test method, definition of test project, definition of process, method of release, even including after-sales processing, Technical support, problem-solving methods, and many other aspects. If you have the idea of data-oriented reliability design, the ideas of each link in product development will be very clear and the target will be very consistent.

We all say system design capabilities in Silicon Valley. To have this ability is not a simple technical breakthrough, not the product of how much, not the breakthrough of functional realization, not the mastery of the management process, more important is the design of the dimensions and height of thinking. The lowest-level requirement for storage systems is data reliability, and teams that do not have design capabilities for data reliability will only get the combat team into the swamp. Thinking determines the way out, for the reliable storage of war, this is the way of storage.


(The way of storage)

This article is from the "Save the Way" blog, make sure to keep this source http://alanwu.blog.51cto.com/3652632/1879634

Discussion on design idea of data-oriented reliability storage System

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.