Swift Object Storage

Source: Internet
Author: User
Tags file copy md5 hash sqlite database ticket openstack swift

Introduction to Swift Object storage

OpenStack Object Storage (Swift) is one of the sub-projects of OpenStack open source cloud computing projects, known as Object storage, providing strong scalability, redundancy, and durability. Object storage, which is used for long-term storage of permanent types of static data.
Swift was originally a highly available distributed object Storage service developed by Rackspace, and in 2010 contributed to the OpenStack open source community as one of its initial core sub-projects, providing virtual machine image storage services for its Nova subproject. Swift is built on a less expensive standard hardware storage infrastructure, eliminating the need for RAID (redundant array of disks), enabling high availability and scalability by introducing consistent hashing and data redundancy at the software level, sacrificing a degree of data consistency to support multi-tenancy patterns, container and object read and write operations, It is suitable for solving the problem of unstructured data storage in the application scenario of Internet.

Fundamentals 1. Consistent hashing (consistent Hashing)

Faced with massive levels of objects that need to be stored on thousands of servers and hard disk devices, the first thing to do is address the problem of how to distribute objects to these device addresses. Swift is based on the consistent hashing technology, by computing the object can be evenly distributed to the virtual space of the virtual node, the increase or deletion of the node can greatly reduce the amount of data to be moved; Virtual space size is usually 2 n power to facilitate efficient shift operation; then through a unique data structure The ring (ring) then maps the virtual node to the actual physical storage device, completing the addressing process.

The hash space that is incremented in a counter-clockwise direction is 4 bytes long and has a total of 32 bits, the integer range is [0~232-1], and the hash result is moved to the right m bit, which produces 232-m virtual nodes, such as m=29, which can produce 8 virtual nodes. The actual deployment requires careful calculation of the appropriate number of virtual nodes to achieve a balance between storage space and workload.

2. Data consistency model

According to Eric Brewer's CAP (consistency,availability,partition tolerance) theory, there is no way to meet 3 aspects, Swift abandons strict consistency (to satisfy the ACID transaction level), The final conformance model (eventual consistency) is used to achieve high availability and unlimited level of scalability. In order to achieve this goal, Swift adopts the Quorum arbitration agreement (Quorum has the meaning of a legal voter):

(1) Definition: N: The total number of copies of data; W: The number of copies confirmed to be accepted by the write operation; R: Number of copies of read operations
(2) Strong consistency: R+w>n, in order to ensure that the copy read and write operation will produce intersection, so as to ensure that the latest version can be read, if the w=n,r=1, you need to update all, suitable for a large number of read a few write operations under the strong consistency; if r=n,w=1, only one copy is updated. By reading all copies to get the latest version, suitable for a large number of write a few read scenes of strong consistency.
(3) Weak consistency: R+w<=n, if the copy collection of read and write operations does not produce intersections, it is possible to read dirty data, which is suitable for scenarios with low consistency requirements.

Swift is aimed at scenarios where both read and write are frequent, so a more eclectic strategy is used, where the write operation needs to satisfy at least half of the successful W >N/2, and then ensures that the copy collection of the read operation and the write operation produces at least one intersection, namely R+w>n. The Swift default configuration is N=3,w=2>n/2,r=1 or 2, which means that each object will have 3 copies, which will be stored on nodes in different regions as much as possible, and w=2 indicates that at least 2 copies need to be updated to be successful; When r=1 indicates that a read operation succeeds and returns immediately, this kind of situation The old version (weak consistency model), when r=2, you need to read the metadata information of 2 copies at the same time by adding the X-newest=true parameter in the read operation request header, then compare the timestamp to determine which is the latest version (strong consistency model), and if the data is inconsistent, The background service process completes data synchronization within a certain time window through the detection and replication protocol to ensure eventual consistency.

3. Data structure of the ring

Loops are designed to map virtual nodes (partitions) to a set of physical storage devices and provide a certain degree of redundancy, and their data structures consist of the following information:
The storage device list, device information includes a unique identification number (ID), zone Number (zone), weight (weight), IP address (IP), port (ports), device name (devices), and meta data (meta).
Partition-to-device mapping relationship (replica2part2dev_id array)
Calculates the displacement of the partition number (Part_shift integer, which is the M in Figure 1)
To find an object's calculation process, for example:

Use the object's hierarchy Account/container/object as the key, using the MD5 hash algorithm to get a hash value, the first 4 bytes of the hash value of the right-shift operation to get the partition index number, moving the number of bits is specified by the above Part_shift settings By partitioning the index number in the partition to Device mapping table (REPLICA2PART2DEV_ID) to find all the corresponding device number of the partition of the object, these devices will be deployed in different regions (zones) as far as possible, the area is just an abstract concept, it can be a machine, a rack, Even a cluster in a building to provide the highest level of redundancy, it is recommended to deploy at least 5 zones; The weight parameter is a relative value that can be adjusted according to the size of the disk, the larger the weight, the more space you can allocate, and more partitions can be deployed.
Swift is the same process for accounts, containers, and objects that are defined separately by the rings that look for accounts and containers.

4. Data Model

Swift uses a hierarchical data model with a total of three layers of logical structure: Account/container/object (i.e. account/container/object) with no limit on the number of nodes per layer and can be arbitrarily scaled. The account and personal account here is not a concept, can be understood as tenants, used to do the top layer of the isolation mechanism, can be used by multiple individual accounts; A container represents a group of objects, such as a folder or a directory; a leaf node represents an object, consisting of metadata and content, 4:

5. System Architecture

Swift uses a fully symmetric, resource-oriented, distributed system architecture design, all of which can be expanded to avoid the spread of single-point failures and affect the overall system operation; The communication mode uses non-blocking I/O mode, which improves the system throughput and response capability.

代理服务(Proxy Server): The external object Service API, which locates the service address according to the information of the ring and forwards the user request to the corresponding account, container or object service, can be scaled horizontally to balance the load due to the stateless REST request protocol.
认证服务(Authentication Server): Validates access to the user's identity and obtains an object access token (token) that will remain in effect for a certain amount of time, verifying the validity of the access token and caching it until the expiration time.
缓存服务(Cache Server): Cached content includes object service tokens, account and container presence information, but does not cache the data of the object itself; The caching service can take a Memcached cluster, and Swift uses a consistent hashing algorithm to allocate cached addresses.
账户服务(Account Server): Provides account metadata and statistics, and maintains a service with a list of contained containers, and each account information is stored in a SQLite database.
容器服务(Container Server): Provides container metadata and statistics, and maintains a service that contains a list of objects, and the information for each container is also stored in a SQLite database.
对象服务(Object Server): Provides object metadata and content services, the contents of each object are stored as files in the file system, metadata is stored as file attributes, and XFS file systems that support extended properties are recommended.
复制服务(Replicator): detects that the local partition replica and remote replica are consistent, by comparing the hash file with the advanced watermark, and by pushing to update the remote copy when inconsistencies are found, such as the object Replication Service using the Remote file Copy tool rsync to synchronize Another task is to ensure that objects that are flagged for deletion are removed from the file system.
更新服务(Updater): When an object cannot be updated immediately because of a high load, the task is serialized to be queued in the local file system for asynchronous updates after the service is restored, such as when the container server does not update the object list in a timely manner after the object is successfully created, and the container's update operation is queued. The update service scans the queue and updates it as soon as the system returns to normal.
审计服务(Auditor): Check the integrity of objects, containers, and accounts, and if found to be more than a premium error, the file will be quarantined and other copies copied to overwrite the locally damaged copy, and other types of errors will be recorded in the log.
账户清理服务(Account Reaper): Remove the account that is marked for deletion and delete all the containers and objects it contains.

Feature 1. Extremely High data durability

Data persistence and system availability are different, referring to the reliability of data, the likelihood that data is stored to the system and lost to a certain day. AS3 data Persistence is 11 9, which means that if you store 10,000 (4 0) files into S3, 10 million (7 0) years later, you may lose 1 files.
We have theoretically calculated that swift in 5 zone, 5x10 Storage node environment, the data copy is 3, the data persistence SLA can reach 10 9.

2. Fully Symmetrical system architecture

"Symmetry" means that each node in Swift can be fully equivalent and can significantly reduce system maintenance costs.

Unlimited scalability

(1) The data storage capacity is infinitely extensible, (2) Swift performance (such as QPS, throughput, etc.) can be linearly improved
Swift is a fully symmetric architecture that simply adds a new machine, and the system automatically completes the data migration process, bringing the storage nodes back to a balanced state.

3. No single point of failure

Metadata problem, Swift's metadata store is completely evenly distributed randomly, and as with object file storage, metadata is stored in multiple copies.

4. Simple, reliable

Simple design

Application Scenarios

The most typical application is the storage engine for the network disk class, such as the AS3 used behind Dropbox. In OpenStack, you can also combine with the mirror service glance to store image files for it. In addition, Swift's unlimited scalability is ideal for storing log files and data backup warehouses.

Architecture Overview

There are three main components of Swift: Proxy server, Storage server, and consistency server. It is shown in Schema 1, where both the storage and consistency services are allowed on storage node. AUTH certification services have now been stripped out of swift, using OpenStack's certification services Keystone, with the goal of unifying the authentication management across OpenStack projects.

API interface

Swift provides an HTTP-based REST service interface through Proxy Server to CRUD operations on accounts, containers, and objects. Before you can access the Swift service, you need to obtain an access token through the Authentication Service (Keystone), and then add the header information X-auth-token to the sent request. The following is an example of a list of containers in the request return account:

Get/v1/<account> http/1.1Host:Storage. Swift. comX-auth-token:eaaafd18-0fed-4b3a-Bayib4-663C99EC1CBB Response header information contains status codes $, the list of containers is contained in the response body: http/1.1  $OkDate:Thu, -Jan -  -: $: -GmtServer:Apachecontent-type:text/plain; Charset=utf-8Content-length: +Imagesmoviesdocumentsbackups

Conclusion

OpenStack Swift, as a stable and highly available open source object store, is being commercialized by many businesses, such as Sina's App Engine, which has been launched and provides a Swift-based object storage service, the Ucloud Storage service of Korea Telecom. It is reasonable to believe that, because of its complete openness, broad user base, and community contributors, Swift may become an open standard for cloud storage, thereby breaking Amazon S3 's monopoly on the market and driving cloud computing forward in a more open and interoperable direction.

Everything is the beginning.

After reading Swift, we found that the cloud computing field is much bigger ... All learning is the beginning Ah!

Redundancy control algorithm based on quorum voting 1. Introduction

Quorom mechanism is a kind of voting algorithm commonly used in distributed system to ensure data redundancy and eventual consistency, and its main mathematical idea comes from pigeon nest principle.

In distributed storage systems with redundant data, redundant data Objects store multiple copies between different machines. But at the same time multiple copies of a data object can only be used for reading or for writing.

The algorithm can guarantee that multiple copies of the same data object will not be read and written by more than two Access objects.

The algorithm is derived from [Gifford, 1979][3][1]. Each copy of the data in the distributed system is given a vote. Each operation must be given a minimum number of votes (Vr) or minimum number of votes (Vw) to read or write. If a system has a V-ticket (meaning a data object has a v redundant copy), then this minimum read-write ticket must satisfy:

Vr + Vw > V
Vw > V/2
The first rule guarantees that a data is not read and written at the same time. When a write request comes in, it must obtain a license for the VW redundant copy. The remaining number is V-VW not enough VR, so no more reading requests come over. Similarly, when a read request has been licensed for a VR redundant copy, the write request is not licensed.

The second rule guarantees the serialization of data modification. A redundant copy of a piece of data cannot be modified by two write requests at the same time.

2. Application

In distributed system, redundant data is the means of guaranteeing reliability, so the consistency maintenance of redundant data is very important. In general, a write operation must update all redundant data to be called a successful end. For example, a data on 5 devices have redundancy, because do not know the reading data will fall on which device, then a write operation, must be 5 devices are updated to complete, write operations to return.

For systems with frequent write operations, the bottleneck of this operation is very large. The quorum algorithm allows the write operation to return as soon as it finishes writing 3 units. The rest is done in slow synchronization within the system. Read operations, you need to read at least 3, to ensure that at least one can read the latest data.

Quorum's read-write minimum number of votes can be used as an adjustable parameter for the system in terms of read and write performance. The larger the number of votes, the larger the number of readers, the smaller the VR, the more expensive the system writes. Conversely, the cost of writing is small.

The article refers to the following two articles:
http://www.ibm.com/developerworks/cn/cloud/library/1310_zhanghua_openstackswift/

Http://www.cnblogs.com/netfocus/p/3622184.html

Swift Object Storage

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.