OpenStack Swift's data consistency principle and consistency Hash principle Description (reprint)

Source: Internet
Author: User
Tags file copy md5 hash sqlite sqlite database project management website openstack swift



Recently, when looking for a job, I was asked some principles of Swift's bottom, so I decided to take notes in order to be able to impress.



The following is reproduced by (http://www.openstack.cn/?p=776)






--Openstack Swift Open Source cloud storage technology Analysis



The OpenStack Swift Open source project provides an elastic, scalable, highly available distributed object Storage service for storing large-scale unstructured data. This article provides an in-depth introduction to Swift's basic design principles, symmetric system architecture, and RESTful APIs.


Background and overview


Swift was originally a highly available distributed object Storage service developed by Rackspace, and in 2010 contributed to the OpenStack open source community as one of its initial core sub-projects, providing virtual machine image storage services for its Nova subproject. Swift is built on a less expensive standard hardware storage infrastructure, eliminating the need for RAID (redundant array of disks), enabling high availability and scalability by introducing consistent hashing and data redundancy at the software level, sacrificing a degree of data consistency to support multi-tenancy patterns, container and object read and write operations, It is suitable for solving the problem of unstructured data storage in the application scenario of Internet.



This project was developed based on Python and is available to developers using the Apache 2.0 license agreement.


Basic principle consistency Hash (consistent Hashing)


Faced with massive levels of objects that need to be stored on thousands of servers and hard disk devices, the first thing to do is address the problem of how to distribute objects to these device addresses. Swift is based on the consistent hashing technology, by computing the object can be evenly distributed to the virtual space of the virtual node, the increase or deletion of the node can greatly reduce the amount of data to be moved; Virtual space size is usually 2 n power to facilitate efficient shift operation; then through a unique data structure The ring (ring) then maps the virtual node to the actual physical storage device, completing the addressing process.


Figure 1. Consistent hashing








As shown in 1, the hash space that increments in counter-clockwise has a total of 4 bytes long and 32 bits, the integer range is [0~232-1], and the hash result is moved to the right M-bit, which produces 232-m virtual nodes, such as m=29, which can produce 8 virtual nodes. The actual deployment requires careful calculation of the appropriate number of virtual nodes to achieve a balance between storage space and workload.


Data consistency models (consistency model)


According to Eric Brewer's CAP (consistency,availability,partition tolerance) theory, there is no way to meet 3 aspects, Swift abandons strict consistency (to satisfy the ACID transaction level), The final conformance model (eventual consistency) is used to achieve high availability and unlimited level of scalability. In order to achieve this goal, Swift adopts the Quorum arbitration agreement (Quorum has the meaning of a legal voter):



(1) Definition: N: The total number of copies of the data; W: The number of copies confirmed to be accepted by the write operation; R: Number of copies of read operations



(2) Strong consistency: R+w>n to ensure that the read-write operation of the copy will produce the intersection, thus ensuring that the latest version can be read, if w=n,r=1, it is necessary to update all, suitable for a large number of read the strong consistency in a small write operation scenario, if r=n,w=1, only one copy is updated. By reading all copies to get the latest version, suitable for a large number of write a few read scenes of strong consistency.



(3) weak consistency : R+w<=n, if the copy collection of read and write operations does not produce intersections, it is possible to read dirty data, which is suitable for scenarios with low consistency requirements.



Swift is aimed at scenarios where both read and write are frequent, so a more eclectic strategy is used, where the write operation needs to satisfy at least half of the successful W >N/2, and then ensures that the copy collection of the read operation and the write operation produces at least one intersection, namely R+w>n. The Swift default configuration is N=3,w=2>n/2,r=1 or 2, which means that each object will have 3 copies, which will be stored on nodes in different regions as much as possible, and w=2 indicates that at least 2 copies need to be updated to be successful; When r=1 indicates that a read operation succeeds and returns immediately, this kind of situation The old version (weak consistency model), when r=2, you need to read the metadata information of 2 copies at the same time by adding the X-newest=true parameter in the read operation request header, then compare the timestamp to determine which is the latest version (strong consistency model), and if the data is inconsistent, The background service process completes data synchronization within a certain time window through the detection and replication protocol to ensure eventual consistency. 2 is shown below:


Figure 2. Quorum Protocol Example




Data structure of the ring


Loops are designed to map virtual nodes (partitions) to a set of physical storage devices and provide a certain degree of redundancy, and their data structures consist of the following information:





    • The storage device list, device information includes a unique identification number (ID), zone Number (zone), weight (weight), IP address (IP), port (ports), device name (devices), and meta data (meta).
    • Partition-to-device mapping relationship (replica2part2dev_id array)
    • Calculates the displacement of the partition number (Part_shift integer, which is the M in Figure 1)


To find an object's calculation process, for example:


Figure 3. Data structures of the ring


Use the object's hierarchy Account/container/object as the key, using the MD5 hash algorithm to get a hash value, the first 4 bytes of the hash value of the right-shift operation to get the partition index number, moving the number of bits is specified by the above Part_shift settings By partitioning the index number in the partition to Device mapping table (REPLICA2PART2DEV_ID) to find all the corresponding device number of the partition of the object, these devices will be deployed in different regions (zones) as far as possible, the area is just an abstract concept, it can be a machine, a rack, Even a cluster in a building to provide the highest level of redundancy, it is recommended to deploy at least 5 zones; The weight parameter is a relative value that can be adjusted according to the size of the disk, the larger the weight, the more space you can allocate, and more partitions can be deployed.



Swift is the same process for accounts, containers, and objects that are defined separately by the rings that look for accounts and containers.


Data model


Swift uses a hierarchical data model with a total of three layers of logical structure: Account/container/object (i.e. account/container/object) with no limit on the number of nodes per layer and can be arbitrarily scaled. The account and personal account here is not a concept, can be understood as tenants, used to do the top layer of the isolation mechanism, can be used by multiple individual accounts; A container represents a group of objects, such as a folder or a directory; a leaf node represents an object, consisting of metadata and content, 4:


Figure 4. Swift Data Model





System architecture



Swift uses a fully symmetric, resource-oriented, distributed system architecture design, all of which can be expanded to avoid the spread of single-point failures and affect the overall system operation; The communication mode uses non-blocking I/O mode, which improves the system throughput and response capability.


Figure 5. Swift System Architecture





Swift components include:


  • Proxy Server: Provides the object service API externally, which locates the service address according to the information of the ring and forwards the user request to the corresponding account, container or object service; Because of the stateless REST request protocol, scale-out can be used to balance the load.
  • Authentication Service (authentication Server): Validates access to the user's identity and obtains an object access token (token), which is valid for a certain amount of time, validates the validity of the access token and caches it until the expiration time.
  • Caching service (cache server): Cached content includes object service tokens, account and container presence information, but does not cache the data of the object itself; The caching service can take a Memcached cluster, and Swift uses a consistent hashing algorithm to allocate cached addresses.
  • Account Server: Provides account metadata and statistics, and maintains a service with a list of containers, each of which is stored in a SQLite database.
  • Container Service (Container Server): Provides container metadata and statistics, and maintains a service that contains a list of objects, and the information for each container is also stored in a SQLite database.
  • Object Services: Provides object metadata and content services, where the contents of each object are stored as files in the file system, metadata is stored as file attributes, and XFS file systems that support extended properties are recommended.
  • Replication Service (Replicator): detects that the local partition replica and remote replica are consistent, by comparing the hash file with the advanced watermark, and by pushing (push) updating the remote copy when the inconsistency is found, such as the object Replication Service using the Remote file Copy tool rsync to synchronize Another task is to ensure that objects that are flagged for deletion are removed from the file system.
  • Update Service (Updater): When an object cannot be updated immediately because of a high load, the task is serialized to be queued in the local file system for asynchronous updates after the service is restored, such as when the object is successfully created and the container server does not update the object list in a timely manner. This time the container update operation will go into the queue, the update service will scan the queue after the system returns to normal and update processing accordingly.
  • Audit Services (Auditor): Check the integrity of objects, containers, and accounts, and if found to be more than a premium error, the file will be quarantined and other copies copied to overwrite the locally corrupted copy; Other types of errors are recorded in the log.
  • Account Reaper: Removes all the containers and objects it contains by removing the account that is marked for deletion.


Api



Swift provides an HTTP-based REST service interface through Proxy Server to CRUD operations on accounts, containers, and objects. Before accessing the Swift service, you need to obtain an access token through the authentication service and then add the header information X-auth-token to the request sent. The following is an example of a list of containers in the request return account:


GET /v1/<account> HTTP/1.1
Host: storage.swift.com
X-Auth-Token: eaaafd18-0fed-4b3a-81b4-663c99ec1cbb
The response header contains a status code of 200, and the container list is included in the response body:
HTTP/1.1 200 Ok
Date: Thu, 07 Jan 2013 18:57:07 GMT
Server: Apache
Content-Type: text/plain; charset=UTF-8
Content-Length: 32

Images
Movies
Documents
Backups 


All operations supported by Swift can be summarized in table 1:


Table 1. Swift RESTful API Summary
url get put post delete head
Account /account/ Get a list of containers - - - Get account meta data
Container /account/container Get List of objects Create a container Update container Metadata Delete Container Get container metadata
Object /account/container/object Get Object content and metadata Create, update, or copy an object Updating object metadata Delete Object Get Object Metadata


The detailed API specification can be consulted in the developer's Guide. Application development can be implemented using Python bindings already contained in the SWIFT project itself, and in other programming languages, you can refer to the Rackspace compatible Swift Cloud Files API to support language bindings such as java,.net,ruby,php.





Conclusion


OpenStack Swift, as a stable and highly available open source object store, is being commercialized by many businesses, such as Sina's App Engine, which has been launched and provides a Swift-based object storage service, the Ucloud Storage service of Korea Telecom. It is reasonable to believe that, because of its complete openness, broad user base, and community contributors, Swift may become an open standard for cloud storage, thereby breaking Amazon S3 's monopoly on the market and driving cloud computing forward in a more open and interoperable direction.


Reference Learning
  • Refer to the OpenStack official website: Get the latest news from Swift's official release.
  • Swift Official Project Management website: provides all the code, Bug tracking, Blueprint, translation and Q&a, with any difficult questions can be asked on the above.
  • OpenStack Object Storage Administrator's Guide: documentation describes how to install and deploy Swift.
  • OpenStack Object Store Developer's Guide: documentation provides detailed API specifications and examples.
  • Start-up company SwiftStack provides a comprehensive introduction to the Swift architecture.
  • Check out Guy Harrison's blog post "consistency model for NoSQL databases" to learn about CAP theory and NWR strategy.
  • See Julien Danjou's blog post "Swift consistency analysis" to understand the principles behind Swift's consistency.
  • Sina's "Swift Architecture and practice" shared on Slideshare to understand the swift architecture.
  • The Cloud files API specification document provided by Rackspace Corporation.
  • Refer to the Swift-based CDMI standard implementation provided on Github to learn how to develop interfaces that support other standards based on Swift.





article source:http://www.ibm.com/developerworks/cn/cloud/library/1310_zhanghua_openstackswift/



OpenStack Swift's data consistency principle and consistency Hash principle Description (reprint)


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.