Openstack Swift Open Source cloud storage technology Analysis

Source: Internet
Author: User
Keywords Swift open source OpenStack cloud storage technology

Swift was originally a highly available distributed object Storage Service developed by Rackspace Corporation and contributed to OpenStack http://www.aliyun.com/zixun/aggregation/13856.html in 2010. "> As one of its initial core subprojects, the open source community provides virtual machine mirroring storage services for its Nova subprojects. Swift builds on a relatively inexpensive standard hardware storage infrastructure, eliminates the need for RAID (redundant array of disks) to achieve high availability and scalability by introducing consistent hashing technology and data redundancy at the software level, sacrificing a certain degree of data consistency to support multi-tenant mode, container and object read and write operations, Suitable for solving the problem of unstructured data storage in the application environment of the Internet.

This project is developed based on Python, using the Apache 2.0 license agreement that can be used by developers to use the system.

Basic

Consistent hash (consistent hashing)

Faced with a massive level of objects, they need to be stored on thousands of servers and hard disk devices, first addressing the problem of how to distribute objects to these device addresses. Swift is based on consistent hashing technology, by calculating the virtual nodes which can distribute the objects evenly to the virtual space, the amount of data to be moved can be greatly reduced when the nodes are added or deleted; The size of the virtual space is usually 2 n power, which facilitates efficient shift operation; and then through unique data structures The ring (loop) then maps the virtual node to the actual physical storage device and completes the addressing process.

Figure 1. Consistent Hash

As shown in Figure 1, the hash space, incremented in a counter-clockwise direction, has a total of 32 bits in 4 bytes, an integer range of [0~232-1], and the hash result to the right M-bit, which produces 232-m virtual nodes, such as m=29, which produces 8 virtual nodes. The actual deployment requires careful calculation to get the appropriate number of virtual nodes to achieve a balance between storage space and workload.

Data consistency models (consistency model)

According to Eric Brewer's CAP (consistency,availability,partition tolerance) theory, which cannot meet 3 aspects at once, Swift abandons strict consistency (satisfying ACID transaction levels), The final consistency model (eventual consistency) is used to achieve high availability and unlimited level scalability. In order to achieve this goal, Swift adopts the Quorum arbitration agreement (Quorum the meaning of the number of legal votes):

(1) Definition: N: Total number of copies of data; W: The number of copies that are acknowledged to be accepted by the write operation; R: Number of copies of read operations

(2) Strong consistency: R+w>n to ensure that the read and write operation of the copy will produce intersection, so as to ensure that the latest version can be read, if the w=n,r=1, you need to update all, suitable for a large number of read a few write operation scene strong consistency; if r=n,w=1, only one copy is updated. By reading all the copies to get the latest version, suitable for a large number of write a small read scene strong consistency.

(3) Weak consistency: R+w<=n, if the copy collection of the read-write operation does not produce an intersection, the dirty data may be read;

Swift aims at a more frequent scenario of read and write, so it uses a more eclectic strategy, that is, write operations need to meet at least half of the successful W >N/2, and then ensure that read operations and write operations of the copy set at least one intersection, that is, r+w>n. Swift's default configuration is N=3,w=2>n/2,r=1 or 2, where there are 3 copies of each object, which are stored as far as possible on nodes in different zones, and w=2 means that at least 2 replicas need to be updated to write successfully; When r=1 means that a read operation succeeds and returns immediately, In this case, the old version may be read (weak consistency model); When r=2, it is necessary to read 2 copies of the metadata information simultaneously by adding the X-newest=true parameter to the read operation request header, and then comparing the timestamp to determine which is the latest version (strong consistency model); If the data is inconsistent , the background service process completes data synchronization through detection and replication protocols within a certain window of time, ensuring final consistency. As shown in Figure 2:

Figure 2. Quorum Protocol Example

Data structure of the ring

A ring is designed to map a virtual node (partition) to a set of physical storage devices and to provide a certain degree of redundancy, and its data structure consists of the following information:

The

storage device list, device information includes a unique identification number (ID), a zone number (zone), weights (weight), IP address (IP), ports (port), device name (device), meta data (meta). Partition to Device mapping relationship (replica2part2dev_id array) calculates the displacement of the partition number (Part_shift integer, that is, m in Figure 1)

For example, to find the calculation process for an object:

Figure 3. The data mechanism of the ring

Using the object's hierarchy Account/container/object as a key, the MD5 hash algorithm is used to get a hash value, the first 4 bytes of the hash value are moved to the right by the partition index number, and the move number is specified by the above Part_shift setting ; by the partition index number in the partition to the Device mapping table (REPLICA2PART2DEV_ID), locate the corresponding device number for the partition in which the object resides, and these devices are chosen to be deployed in different areas (Zone), and the area is an abstract concept, it can be a machine, a rack, Or even a cluster within a building to provide the highest level of redundancy, it is recommended to deploy at least 5 zones; The weight parameter is a relative value that can be adjusted according to the size of the disk, and the larger the weight indicates the more space you can allocate, the more partitions you can deploy.

Swift is the same process of locating accounts and containers for accounts, containers, and objects that are defined separately.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.