The greatest attraction of open source is the ability to satisfy people's desire and curiosity, so that we can understand a system in depth, if we find that its design or implementation of any unreasonable or wrong place, we can come up with our own ideas and realize it, and personally to improve a people are concerned about things, To the benefit of countless people. Today we're going to talk about an open source object storage System--openstack Swift. 1. Overview of Swift Swift is an object storage system that provides restful HTTP interfaces, originally originated from Rackspace's cloud Files, to provide a service that competes with AWS S3. Swift, open source in 2010, is one of OpenStack's first two projects. However, in the domestic OpenStack circle, it is not very possible to hear about Swift's voice, as described in the first "FileSystem vs Object Storage-selection and trend" in this series, the RESTful HTTP interface object storage, mainly for Internet application services, The traditional industry users most concerned about by OpenStack vendors are currently able to apply this storage model. But in fact, Swift has some successful applications in some local internet companies, including Sina, American Regiment, Iqiyi, Phoenix and so on. More widely used abroad, as early as 2010, Swift ushered in the first commercial case outside Rackspace-Korea Telecom, a familiar Wikipedia, ebay and other Swift users. It is believed that with the application architecture of the Internet technology being accepted by the traditional industry, object storage and Swift will be more and more widely concerned. From the OpenStack kilo version of the data, the swift community presents a diversity of features and is developing healthily. This article and the next two articles in this series will introduce the architecture of Swift and give examples of scale deployment, starting with a hardware configuration to build a swift cluster, summarizing Swift's characteristics and presenting the challenges and trends facing swift. 2. Swift's data organization structure Swift divides the entire storage into three tiers: account, Container, and Object. The account itself is only a storage area and does not represent the "accounts" in the authentication system, but it usually has a tenant for each one. This is why we, as an OpenStack user, can only see container and object when using Swift, and cannot see the account, if the user switches to another tenant, He will see that belonging to another tenant is also the container and object under another account. 3. The architecture of the Swift cluster Compared to other projects in OpenStack, Swift is more independent, and users can choose to deploy them individually, but they can also integrate with other OPENSATCK projects, even with Cloudfoundry and Docker. In general, the Swift cluster consists of two types of nodes: the proxy node and the storage node. A simple swift cluster looks like this: On the proxy node, the proxy Server service process is responsible for receiving and responding to the user's HTTP requests, and proxy server is a stateless service that can be scaled out easily. Since Swift's own authentication service is only a test tempauth, it is often necessary to use an external authentication service or to install additional authentication services on the proxy node. If we have an openstack environment, we can directly use the keystone in that environment, and if you do not need to integrate with other parts of OpenStack, you can also install a separate Keystone on the proxy node to provide authentication services. The storage node mainly runs three classes of storage service processes: Account server, Container server, and Object Server, respectively, responsible for storage of account, Container, and object data, so In some literatures, the storage node is called the ACO node. Proxy server, known as a ring data structure, determines which storage node the data is stored on, and the ring is an improved consistent hash implementation that maps a copy of the data to multiple devices, mapped to several devices, depending on the number of replicas set at the time the ring was created. With the release of Swift version 2.0, users can use the storage policies (Storage policy) feature to specify a different number of copies for each container, or to use the Erasure code (Erasure code), refer to the OpenStack Swift Storage Policy (http://www.ibm.com/developerworks/cn/cloud/library/1411_limy_openstackswift/) In Swift, each account and each container information is stored separately in a separate SQLite database, that is, although logically, the entire storage space is divided into account, container and object three levels, But virtually every account, container, and every object corresponds to a file on the storage node. So, Swift's entire storage space is a flat Namespace, which can be seen as a k/v store. It is not difficult to understand why Swift is an all-peer architecture, because there is no management function of the meta-data server, account and container storage mode and object is not essential difference, each storage service in the cluster status is equal. Proxy server Determines whether a user is requesting an account, a container, or an object based on the structure of the URI in the rest request. I do not agree with the fact that, in some literature or when some people communicate, the account and container information is called "meta-data" in Swift, and I do not see it in the official swift documentation that says that these two types of information are called "metadata". They are fundamentally different from the metadata of the file system. In fact, in swift, metadata (metadata) refers to the object's attribute (attribute), which is the description of the object, and Swift uses the features of underlying mechanisms such as XFS to store the object's properties/metadata and the object's data. If we want to expand the function of swift, such as when the user uploads the object, such as virus detection or pornographic images and other illegal information scanning, can be implemented by adding middleware on the proxy server, where the middleware is Python A concept in the WSGI framework, in which each HTTP request is processed through a layer of middleware, is delivered to the most core Proxy Server;proxy Server, and is finally returned to the user through the processing of a layer of middleware. In fact, Swift's invocation of Keystone is achieved through middleware. 4. Example of a medium-size swift deployment The architecture of the Swift cluster allows us to easily extend the number of proxy nodes and storage nodes. An example of a medium-size Swift deployment: In this case, the proxy node uses the Lenovo System X 3650 Server, the storage node uses the Hyper-Cloud R6440-G9 server (this server's specific case and technical analysis please refer to the article "Spectator High Density Storage server (1): Unscrupulous mix type") 6 Fully equipped 4U cages of 24 storage nodes, 12 pieces per node of Seagate's 3.5-inch 4T hard drive, capable of providing about 1.2 PB of physical storage space, using a three-copy strategy, a total of 384TB of storage capacity. Swift divides the storage node into zones and saves the copy to a different zone, in general, the number of zones should be greater than or equal to the number of replicas. Zone division principle for physical fault isolation, for example, we can divide the zone by the cabinet, or as in the figure of a four-node cage four servers into the same zone, divided into different cages of the nodes into different zones. If the number of nodes is small, you can divide each server into a zone. Zones can also be partitioned in experiments, development, and test environments by disk, for example, by dividing 12 disks in a server into different zones. Next, we'll start with the hardware configuration and see how to build a swift cluster step-by-step. |