Background and overview
Swift was originally a highly available distributed object Storage service developed by Rackspace, and in 2010 contributed to the OpenStack open source community as one of its initial core subprojects, providing virtual machine mirroring storage services for its Nova subprojects. Swift builds on a relatively inexpensive standard hardware storage infrastructure, eliminates the need for RAID (redundant array of disks) to achieve high availability and scalability by introducing consistent hashing technology and data redundancy at the software level, sacrificing a certain degree of data consistency to support multi-tenant mode, container and object read and write operations, Suitable for solving the problem of unstructured data storage in the application environment of the Internet.
This project is developed based on Python, using the Apache 2.0 license agreement that can be used by developers to use the system.
Basic principle
Consistent hash (consistent hashing)
Faced with a massive level of objects, they need to be stored on thousands of servers and hard disk devices, first addressing the problem of how to distribute objects to these device addresses. Swift is based on consistent hashing technology, by calculating the virtual nodes which can distribute the objects evenly to the virtual space, the amount of data to be moved can be greatly reduced when the nodes are added or deleted; The size of the virtual space is usually 2 n power, which facilitates the efficient shift operation; and then through the unique data structure The ring (loop) then maps the virtual node to the actual physical storage device and completes the addressing process.
Figure 1. Consistent Hash
As shown in Figure 1, the hash space, incremented in a counter-clockwise direction, has a total of 32 bits in 4 bytes, an integer range of [0~232-1], and the hash result to the right M-bit, which produces 232-m virtual nodes, such as m=29, which produces 8 virtual nodes. The actual deployment requires careful calculation to get the appropriate number of virtual nodes to achieve a balance between storage space and workload.
Data consistency models (consistency model)
According to Eric Brewer's CAP (consistency,availability,partition tolerance) theory, which cannot meet 3 aspects at once, Swift abandons strict consistency (satisfying ACID transaction levels), The final consistency model (eventual consistency) is used to achieve high availability and unlimited level scalability. In order to achieve this goal, Swift adopts the Quorum arbitration agreement (Quorum the meaning of the number of legal votes):
(1) Definition: N: Total number of copies of data; W: The number of copies that are acknowledged to be accepted by the write operation; R: Number of copies of read operations
(2) Strong consistency: R+w>n to ensure that the read and write operation of the copy will produce intersection, so as to ensure that the latest version can be read, if the w=n,r=1, you need to update all, suitable for a large number of read a few write operation scene strong consistency; if r=n,w=1, only one copy is updated. By reading all the copies to get the latest version, suitable for a large number of write a small read scene strong consistency.
(3) Weak consistency: R+w<=n, if the copy collection of the read-write operation does not produce an intersection, it may read dirty data, and is suitable for scenarios with low conformance requirements.
Swift aims at a more frequent scenario of read and write, so it uses a more eclectic strategy, that is, write operations need to meet at least half of the successful W >N/2, and then ensure that read operations and write operations of the copy set at least one intersection, that is, r+w>n. Swift's default configuration is N=3,w=2>n/2,r=1 or 2, that is, there will be 3 copies of each object, and the replicas will be stored on nodes in different areas as much as possible, and w=2 indicates that at least 2 replicas need to be updated to write successfully; When r=1 means that a read operation succeeds and returns immediately, this kind of feeling Condition may be read to the old version (weak consistency model); When r=2, it is necessary to read 2 copies of the metadata information at the same time by adding the X-newest=true parameter to the read operation request header, then comparing the timestamp to determine which is the latest version (strong consistency model), and if the data is inconsistent, The background service process completes data synchronization through detection and replication protocols within a certain window of time, ensuring final consistency. As shown in Figure 2:
Figure 2. Quorum Protocol Example