OpenStack actually has three storage-related components, and these three components are well known to coincide with the timing of the component itself, arranged as follows:
swift--provides object storage (Storage), which is conceptually similar to Amazon S3 services, but Swift is highly extensible, redundant, and persistent, and also compatible with the S3 API
glance--provides virtual machine mirroring (image) storage and management, including many features similar to the Amazon AMI catalog. (Glance's background data is stored in swift from the initial practice).
cinder--provides block Storage, similar to Amazon's EBS block storage service, which is currently only used for virtual machine mounts.
(Amazon has been the OpenStack design of the beginning of the illusion of opponents and challenges, so basically the key functional modules have a corresponding project.) In addition to the three components mentioned above, for the important EC2 service in AWS, OpenStack is the Nova to correspond, and to maintain compatibility with EC2 API, there are different ways to achieve it.
In three components, glance is mainly the management of virtual machine mirroring, so it is relatively simple; Swift has matured as an object store, and even cloudstack supports it. Cinder is a relatively new piece of storage, design concept is good, and commercial storage has the opportunity to combine, so the manufacturers are more positive.
Swift
With regard to Swift's architecture and deployment discussions, there are many articles on the web, along with the official website, and there is no repetition here. (I can also refer to the PPT of my previous speech at Shanghai station in OpenStack China Line). From the development perspective, there is not much structural adjustment recently, so I would like to talk about the application of more suitable areas of good.
From the actual cases I've learned, Swift has 4 areas, (there should be more, I hope you see the actual use cases can be taught)
1. Network disk.
Swift's symmetric distributed architecture and multi-proxy multi-node design cause it to be genetically suitable for multi-user concurrency applications, the most typical applications are similar to Dropbox network disk applications, Dropbox last year has exceeded the number of 100 million users, for this scale of access, Good architectural design is the root cause that can support.
Swift's symmetric architecture makes data nodes logically at the same level, with both data and associated metadata on each node. And the core data structure of the metadata is Hashi, the consistency hashing algorithm needs to reposition only a small part of the data in the ring space, and it has good fault tolerance and scalability. In addition, the data is stateless, and each data is stored completely on disk. These points combine to ensure the good scalability of the storage itself.
In addition to the application, Swift is the language of the HTTP protocol, which makes the interaction between applications and storage simple, without having to consider the details of the underlying infrastructure, and the application software does not need any modification to make the system whole to a very large extent.
2.IaaS Public Cloud
Swift in the design of linear expansion, high concurrency and multi-tenant support features, so that it is also very suitable for the choice of IaaS, the public cloud large-scale, more encountered a large number of virtual machine concurrent to start this situation, so for virtual machine mirroring of the background storage specifically, The real challenge is the concurrent read performance of large data (over G), where Swift has been proven to be a mature choice in the first place in OpenStack as the background storage of a mirrored library, and after years of practice rackspace thousands of machines deployed.
In addition, if you want to provide an upper-level SaaS service based on IaaS, multi-tenant is an unavoidable problem, and Swift's architecture design itself is to support multi-tenant, which makes it easier to dock.
3. Backup Archive
The main business of Rackspace is the backup archive of data, so Swift is proven in this area, and they also extend a new business-"Hot archive". Due to the long tail effect, the time window in which data may be invoked is growing, and thermal archiving ensures that application archive data can be retrieved at the minute level, a significant improvement over the hours in traditional tape drive archiving scenarios.
4. Mobile Internet and CDN
Mobile internet and mobile games generate a lot of user data, the amount of data is not very large but the number of users, this is the area that Swift can handle.
With CDN, if you use Swift, cloud storage can respond directly to mobile devices, do not require a dedicated server to respond to this HTTP request, and do not need to pass through the file system on the mobile device in the data transfer, directly uploading the cloud with the HTTP protocol. If you cache data that is often accessed by the platform, with a certain optimization mechanism, the data can be distributed from different locations to your users, which can improve the speed of access, I recently saw the development community in Swift, a video site application and the combination of swift, the theft is a noteworthy direction.
Glance
Glance is relatively simple, is a virtual machine image storage. Front-end Nova (or other virtual management platforms with Glance-client installed) provides mirroring services, including storage, querying, and retrieval. The module itself does not store large amounts of data and needs to mount the background storage (SWIFT,S3 ...). To store the actual mirrored data.
Glance mainly includes the following sections:
L API Service:glance-api is mainly used to receive various API call requests from Nova, put the request into the RBMQ to the background processing.
L Glacne-registry used to interact with MySQL databases to store or retrieve mirrored metadata, note that Swift, as mentioned in Swift, does not save metadata in its own storage server. The metadata here refers to some information about mirroring stored in the MySQL database, which belongs to glance.
L Image store: The backend storage interface, which gets the mirror, the default storage that is mounted in the background is swift, but it also supports other mirrors such as Amazon S3.
Glance looks a bit like virtual storage from some point of view, also provides the API, can realize the more complete mirroring management function. So theoretically other cloud platforms can also use it.
Glance is relatively simple, also limited to the cloud inside, so there is nothing more to open the discussion, rather look at the new block storage components cinder, at present, I cinder basic view is the overall design is good, details and functions there are many need to improve the place, from a mature product still a little distance.
Cinder
One of the major changes in the OpenStack to the F version is the separation of some of the persistent block storage functions (Nova-volume) from the Nova, which is independent of the new component cinder. It integrates the back-end of a variety of storage, with API interface to provide the outside block storage services, the main core is the management of volumes, allowing the volume, the type of volume, the volume of the snapshot processing.
Cinder contains the following three main components
API Service:cinder-api is the primary service interface that accepts and processes external API requests and puts requests into RABBITMQ queues for backend execution. Cinder currently provides volume API V2
Scheduler Service: Work with task queues and select the appropriate volume service node to perform tasks according to predetermined policies. The current version of Cinder only provides a simple Scheduler, which selects a single active node with the least number of volumes to create a volume.
Volume Service: It runs on storage nodes, manages storage space, and the tower handles read and write requests for the maintenance status of cinder databases, interacting with other processes via Message Queuing and directly on block storage devices or software. Each storage node has a volume Service, and several such storage nodes combine to form a pool of storage resources.
Cinder to support different types and models of storage by adding a different vendor's designated drivers. The current support for commercial storage devices are several EMC and IBM, but also through LVM support for local Storage and NFS protocol support NAS storage, so NetApp NAS should also be fine, as if Huawei is also in the effort. I also saw IBM's GPFS Distributed File system in Cinder's blueprints in the previous period, which should be added in later versions
So far, cinder mainly and openstack of the Nova internal interaction, to provide a virtual machine for the required volume attach, but theoretically can also provide a separate block storage.
Deployment, you can deploy three services to a single server, or you can deploy independently to different physical nodes
Now cinder is still not mature enough, there are several obvious problems have not been well resolved, one is to support the commercial storage is not enough, but also does not support FC Sans, another single point of failure to solve the hidden trouble, the internal schedule scheduling algorithm is too simple. In addition, because it put a variety of storage integration came in and added a layer of management, there is a way, but the efficiency must be influential, performance must be lost, but this is no way.
OpenStack through more than two years of development, become more and more huge. At present, there are three kinds of optical storage: object storage, mirrored storage and block storage. This is also to meet more different needs, reflecting the flexible and fast Open source project features. In general, when choosing a storage system, it should be considered a long-term decision if it is considered to be used by multiple applications in the future. OpenStack as an open system, the most important thing is to solve the problem of hardware and software vendor lock-in, you can choose a new hardware vendor at any time, the new hardware and existing hardware to form a hybrid cluster, unified management, of course, can also replace the software technology services provider, without moving applications. This is the advantage of open source itself!