A survey of Ceph
This article will outline the basic situation of ceph, so that readers can build a preliminary impression on ceph without involving the technical details.
1. What is Ceph?
Ceph's official website ceph.com A concise definition of ceph in the following phrase:
"Ceph is a unified, distributed storage System designed for excellent configured, reliability and scalability."
In other words, Ceph is a unified, distributed storage system designed for excellent performance, reliability, and scalability. It should be said that this statement does point out the gist of the ceph, can be used as an understanding of the ceph system design ideas and implementation mechanism of the basic starting point. In this definition, special attention should be paid to the two modifiers of the concept of "storage System", namely "unified" and "distributed".
In particular, "unified" means that Ceph can provide both object storage, block storage, and file system storage in a set of storage systems to simplify deployment and operation in the context of meeting different application requirements. and "distributed" in the ceph system means that the real center-free structure and no theoretical upper limit of the scale of the system scalability. In practice, Ceph can be deployed on thousands of servers. By the beginning of March 2013, Ceph's largest system deployed in the production environment was the DreamHost Company's object storage business cluster, which managed physical storage capacity of 3PB.
2. Why pay attention to Ceph?
In fact, Ceph is not just an open source project. On the contrary, from the initial release to the gradual popularity, Ceph has gone through more than seven years of long distance. The author thinks that the reason why the ceph should be understood is that there are roughly two reasons:
First of all, Ceph itself does have a more prominent advantage.
Ceph is worth mentioning a lot of advantages, including unified storage capacity, scalability, reliability, performance, automated maintenance, and so on. In essence, ceph of these advantages are derived from its advanced core design ideas, the author summed it up to eight words-"No need to check the table, the calculation is good." Based on this design idea, ceph fully give full play to the computing power of the storage device, and eliminate the dependence on the single center node of the system, so as to realize the real without center structure. Based on this design idea and structure, CEPH has achieved high reliability and scalability on the one hand, and ensured the relatively low latency and high aggregation bandwidth of client access. Through the introduction of the following content, the reader can see that ceph almost all the outstanding characteristics of the implementation of this core design ideas.
Second, Ceph is now highly valued in the OpenStack community.
OpenStack is currently the most popular open source cloud operating system. According to the author observation, the reason why Ceph in the last one or two years, the most powerful driving factor is the OpenStack community's actual needs. For now, Ceph has become one of the most vocal open source storage solutions in the OpenStack community, with its actual applications involving block storage and object storage, and beginning to expand to the file system domain. The relevant information in this section will also be introduced in subsequent articles.
3. Emergence and development of Ceph
Generally speaking, the source of open source project is three: one is the subject of the school, the paper is enough and then open source; the second is the enterprise of the products, the opportunity coincidence so open source; Third, some Daniel suddenly appeared, then a vote of people followed together open source. There are many examples of each category, and different origins of open source projects have their own different characteristics. In particular, the first category of projects is likely to be quite unique in principle and technology, and Ceph is in this column. In contrast, the design implementations of the second type of project are likely to be quite mature, and access to actual deployment applications in the production environment prior to open source or early open source. This background factor is likely to have an impact on the subsequent development of an Open-source project.
Anyway。 The Ceph project originated from the study by its founder Sage Weil at the University of California at Santa Cruz. The start time for the project is 2004 years. At the OSDI academic conference in 2006, Sage published a paper on Ceph and provided a download link to the Ceph project at the end of the paper. Thus, Ceph began to be widely known.
Ceph is developed using the C + + language. This option is understandable for a typical system project that emphasizes performance.
As an open source project, Ceph follows the LGPL protocol.
According to the information on Inktank's official website, Cpeh's ecosystem participates in the following figure:
It is easy to see that the list of manufacturers or organizations with a clear cloud of the atmosphere.
With the increasing heat of Ceph, Sage Weil founded the Inktank company in 2011 to dominate Ceph development and community maintenance. Currently, the Ceph release cycle is three months.
4. Sage Weil His People
In the follow-up to the technical discussion, the appropriate gossip sage Weil's life experience is very necessary, because this brother is really the young it is rare in the engineering, research, entrepreneurship three areas have dabbled and have a great contribution to the God-man.
Sage's ability to work naturally doesn't have to be said, and his published ceph paper is the OSDI of the world's leading academic conference in the field of computer operating systems. As for entrepreneurship, Sage is the co-founder of DreamHost, he was 1997, he just went to college soon ... Interested students can go to LinkedIn to study the personal resume of sage, basically want to work on the job, want to go to school on the school, want to start a business, want to read Bo Bo, arbitrary, unrestrained, a kind of express admiration of the impulse.
The design idea of Ceph
Analysis of open source projects, often encountered a problem is insufficient data. Daniel, who has time to write code, usually has no time or disdain to write documents at all. And not many documents are often used as manuals or something. Even occasional design documents are often vague. In this case, want to reverse the design from the code to extract the idea, after all, not everyone can do.
Thankfully, Ceph is a typical Open-source project that originated in academic research. Although the academic research career for Sage is only a short one of its glorious deeds, but there are still a few academic literature available for reference. This also provides us with a rare opportunity to analyze an excellent open source project in a system field from a top-level perspective. The content of this article is also the author's experience in reading these documents.
1. Ceph Target Application Scenario
To understand the design idea of Ceph, first of all, it is necessary to understand the target application scenario of the sage design ceph, in other words, "What is the purpose of this?"
In fact, Ceph's original target scenario was a large, distributed storage system. The so-called "mass" and "distributed" means at least the ability to host petabytes of data and consist of thousands of storage nodes.
In the big data slogan today, PB is far from an exciting system design goals. It should be noted, however, that the Ceph project originated in 04. That is a commercial processor with a single core stream, common hard disk capacity is only dozens of GB of the age. This is not the same as the current 6-core 12-thread dual-processor, one-piece hard disk 3TB has become commonplace. Therefore, to understand this design objective, we should consider the actual situation at that time. Of course, as mentioned earlier, the design of ceph is not theoretically limited, so the PB level is not the actual capacity limit applied.
In Sage's mind, for such a large-scale storage system, can not be viewed in a static perspective. For its dynamic characteristics, the author summarizes the following three "changes":
Changes in the size of the storage system: such large-scale storage systems, often not on the first day of construction can be expected to its final size, or even the concept of the ultimate scale does not exist. Only with the continuous development of business, the expansion of business scale, so that the system to carry more and more large data capacity. This means that the scale of the system naturally changes and grows larger.
Device changes in the storage system: for a system consisting of thousands of nodes, the failure and replacement of nodes must be a frequent occurrence. On the one hand, the system should be reliable enough, can not make the business by this frequent hardware and the underlying software problems, but also should be as intelligent as possible, reduce the cost of maintenance operations.
Changes in the data in the storage system: for a large-scale storage system that is commonly used in Internet applications, the changes in stored data are likely to be highly frequent. New data is being written and data is being updated, moved, or even deleted. This scenario needs to be considered in design.
These three "Changes" are the key features of the Ceph target application scenario. The main features of Ceph are also presented for these scene features.
2. Expected technical characteristics for target scenarios
For the above application scenario, several technical features of ceph at the beginning of design are:
High reliability. The so-called "high reliability", first of all, for the data stored in the system, that is, as far as possible to ensure that the data will not be lost. Secondly, it also includes the reliability of the data writing process, that is, when the user writes the data to the Ceph storage system, it will not cause the data loss because of the unexpected situation.
Highly automated. It includes automatic replication of data, automatic re-balancing, automatic failure detection and automatic failure recovery. On the whole, these automation characteristics on the one hand to ensure that the system is highly reliable, on the one hand, it also ensures that the scale of the system after the expansion of the difficulty can still remain at a relatively low level.
High scalability. The concept of "scalable" here is more broadly this includes both the scale of the system and the scalability of the storage capacity, as well as the linear expansion of aggregated data access bandwidth as the number of system nodes increases, including the ability to provide multiple functions based on the rich, powerful underlying APIs that support multiple applications.
3. Design ideas for expected technical characteristics
For the expected technical features described in section 3.2, Sage's design approach to Ceph can be summed up in the following two points:
Give full play to the storage device's own computing capacity. In fact, the idea of using computing-capable devices (the simplest example being a common server) as a storage node for a storage system is not new at the time. However, Sage believes that these existing systems basically just use these nodes as simple storage nodes. If the computing power on the node is fully realized, the expected characteristics mentioned above can be achieved. This has become the core idea of ceph system design.
Remove all center points. Once the central point in the system, on the one hand to introduce a single point of failure, on the other hand, it will inevitably face when the scale of the system expansion and performance bottlenecks. In addition, if the center point appears on the critical path of data access, it actually causes a delay in data access to increase. These are obviously problems that should not arise in the system that sage envisions. Although in most systems engineering practice, the problem of single point of failure and performance bottleneck can be alleviated by increasing the backup at the center point, the Ceph system finally solves this problem more thoroughly with the innovative method.
4. The key technology innovation of the support design idea realization
No matter how new and wonderful design ideas, the final landing must be supported by technical strength. And that's where ceph is most shiny.
Ceph's most central technological innovation is the eight words outlined above--"No need to check the table, calculate the good." In general, a large distributed storage system must be able to solve the two most basic problems:
One is "Where should I write the data?" For a storage system, when the user submits the data that needs to be written, the system must make a quick decision to allocate a storage location and space for the data. The speed of this decision affects the data write delay, and more importantly, the rationality of the decision also affects the uniformity of the data distribution. This will further affect the storage unit life, data storage reliability, data access speed and other follow-up issues.
The second is "where I wrote the data before". For a storage system, efficient and accurate handling of data addressing is also one of the basic capabilities.
In view of the above two problems, traditional distributed storage systems commonly used solution is to introduce a dedicated server node, in which to maintain the data storage space mapping relationship data structure. When the user writes/accesses the data, first connect the server for a lookup operation, and then connect the corresponding node for subsequent operation after deciding/locating the actual storage location of the data. As a result, traditional solutions tend to lead to single point failures and performance bottlenecks on the one hand, and can lead to longer operating delays on the other.
To address this problem, Ceph completely abandoned the data addressing method based on look-up table, instead of using the method based on computation. In short, any client program that ceph a storage system simply calculates it by using a small amount of local metadata that is periodically updated, and can determine its storage location based on the ID of a single data. As you can see, this approach makes the problem of traditional solutions away. Almost all of the outstanding features of Ceph are based on this type of data addressing.