Introduction to Openstack Swift Principles, architecture and APIs

Source: Internet
Author: User
Tags file copy md5 hash sqlite sqlite database openstack swift

Background and overview

Swift was originally a highly available distributed object Storage service developed by Rackspace, and in 2010 contributed to the OpenStack open source community as one of its initial core sub-projects, providing virtual machine image storage services for its Nova subproject. Swift is built on a less expensive standard hardware storage infrastructure, eliminating the need for RAID (redundant array of disks), enabling high availability and scalability by introducing consistent hashing and data redundancy at the software level, sacrificing a degree of data consistency to support multi-tenancy patterns, container and object read and write operations, It is suitable for solving the problem of unstructured data storage in the application scenario of Internet.

This project was developed based on Python and is available to developers using the Apache 2.0 license agreement.

Basic principle consistency Hash (consistent Hashing)

Faced with massive levels of objects that need to be stored on thousands of servers and hard disk devices, the first thing to do is address the problem of how to distribute objects to these device addresses. Swift is based on the consistent hashing technology, by computing the object can be evenly distributed to the virtual space of the virtual node, the increase or deletion of the node can greatly reduce the amount of data to be moved; Virtual space size is usually 2 n power to facilitate efficient shift operation; then through a unique data structure The ring (ring) then maps the virtual node to the actual physical storage device, completing the addressing process.

Figure 1. Consistent hashing

As shown in 1, the hash space that increments in counter-clockwise has a total of 4 bytes long and 32 bits, the integer range is [0~232-1], and the hash result is moved to the right M-bit, which produces 232-m virtual nodes, such as m=29, which can produce 8 virtual nodes. The actual deployment requires careful calculation of the appropriate number of virtual nodes to achieve a balance between storage space and workload.

Data consistency models (consistency model)

According to Eric Brewer's CAP (consistency,availability,partition tolerance) theory, there is no way to meet 3 aspects, Swift abandons strict consistency (to satisfy the ACID transaction level), The final conformance model (eventual consistency) is used to achieve high availability and unlimited level of scalability. In order to achieve this goal, Swift adopts the Quorum arbitration agreement (Quorum has the meaning of a legal voter):

(1) Definition: N: The total number of copies of the data; W: The number of copies confirmed to be accepted by the write operation; R: Number of copies of read operations

(2) Strong consistency: R+w>n to ensure that the read-write operation of the copy will produce the intersection, thus ensuring that the latest version can be read, if w=n,r=1, it is necessary to update all, suitable for a large number of read the strong consistency in a small write operation scenario, if r=n,w=1, only one copy is updated. By reading all copies to get the latest version, suitable for a large number of write a few read scenes of strong consistency.

(3) weak consistency : R+w<=n, if the copy collection of read and write operations does not produce intersections, it is possible to read dirty data, which is suitable for scenarios with low consistency requirements.

Swift is aimed at scenarios where both read and write are frequent, so a more eclectic strategy is used, where the write operation needs to satisfy at least half of the successful W >N/2, and then ensures that the copy collection of the read operation and the write operation produces at least one intersection, namely R+w>n. The Swift default configuration is N=3,w=2>n/2,r=1 or 2, which means that each object will have 3 copies, which will be stored on nodes in different regions as much as possible, and w=2 indicates that at least 2 copies need to be updated to be successful; When r=1 indicates that a read operation succeeds and returns immediately, this kind of situation The old version (weak consistency model), when r=2, you need to read the metadata information of 2 copies at the same time by adding the X-newest=true parameter in the read operation request header, then compare the timestamp to determine which is the latest version (strong consistency model), and if the data is inconsistent, The background service process completes data synchronization within a certain time window through the detection and replication protocol to ensure eventual consistency. 2 is shown below:

Figure 2. Quorum Protocol Example

Data structure of the ring

Loops are designed to map virtual nodes (partitions) to a set of physical storage devices and provide a certain degree of redundancy, and their data structures consist of the following information:

    • The storage device list, device information includes a unique identification number (ID), zone Number (zone), weight (weight), IP address (IP), port (ports), device name (devices), and meta data (meta).

    • Partition-to-device mapping relationship (replica2part2dev_id array)

    • Calculates the displacement of the partition number (Part_shift integer, which is the M in Figure 1)

To find an object's calculation process, for example:

Figure 3. Data structures of the ring

Use the object's hierarchy Account/container/object as the key, using the MD5 hash algorithm to get a hash value, the first 4 bytes of the hash value of the right-shift operation to get the partition index number, moving the number of bits is specified by the above Part_shift settings By partitioning the index number in the partition to Device mapping table (REPLICA2PART2DEV_ID) to find all the corresponding device number of the partition of the object, these devices will be deployed in different regions (zones) as far as possible, the area is just an abstract concept, it can be a machine, a rack, Even a cluster in a building to provide the highest level of redundancy, it is recommended to deploy at least 5 zones; The weight parameter is a relative value that can be adjusted according to the size of the disk, the larger the weight, the more space you can allocate, and more partitions can be deployed.

Swift is the same process for accounts, containers, and objects that are defined separately by the rings that look for accounts and containers.

Data model

Swift uses a hierarchical data model with a total of three layers of logical structure: Account/container/object (i.e. account/container/object) with no limit on the number of nodes per layer and can be arbitrarily scaled. The account and personal account here is not a concept, can be understood as tenants, used to do the top layer of the isolation mechanism, can be used by multiple individual accounts; A container represents a group of objects, such as a folder or a directory; a leaf node represents an object, consisting of metadata and content, 4:

Figure 4. Swift Data Model

System architecture

Swift uses a fully symmetric, resource-oriented, distributed system architecture design, all of which can be expanded to avoid the spread of single-point failures and affect the overall system operation; The communication mode uses non-blocking I/O mode, which improves the system throughput and response capability.

Figure 5. Swift System Architecture

Swift components include:

  • Proxy Server: Provides the object service API externally, which locates the service address according to the information of the ring and forwards the user request to the corresponding account, container or object service; Because of the stateless REST request protocol, scale-out can be used to balance the load.

  • Authentication Service (authentication Server): Validates access to the user's identity and obtains an object access token (token), which is valid for a certain amount of time, validates the validity of the access token and caches it until the expiration time.

  • Caching service (cache server): Cached content includes object service tokens, account and container presence information, but does not cache the data of the object itself; The caching service can take a Memcached cluster, and Swift uses a consistent hashing algorithm to allocate cached addresses.

  • Account Server: Provides account metadata and statistics, and maintains a service with a list of containers, each of which is stored in a SQLite database.

  • Container Service (Container Server): Provides container metadata and statistics, and maintains a service that contains a list of objects, and the information for each container is also stored in a SQLite database.

  • Object Services: Provides object metadata and content services, where the contents of each object are stored as files in the file system, metadata is stored as file attributes, and XFS file systems that support extended properties are recommended.

  • Replication Service (Replicator): detects that the local partition replica and remote replica are consistent, by comparing the hash file with the advanced watermark, and by pushing (push) updating the remote copy when the inconsistency is found, such as the object Replication Service using the Remote file Copy tool rsync to synchronize Another task is to ensure that objects that are flagged for deletion are removed from the file system.

  • Update Service (Updater): When an object cannot be updated immediately because of a high load, the task is serialized to be queued in the local file system for asynchronous updates after the service is restored, such as when the object is successfully created and the container server does not update the object list in a timely manner. This time the container update operation will go into the queue, the update service will scan the queue after the system returns to normal and update processing accordingly.

  • Audit Services (Auditor): Check the integrity of objects, containers, and accounts, and if found to be more than a premium error, the file will be quarantined and other copies copied to overwrite the locally corrupted copy; Other types of errors are recorded in the log.

  • Account Reaper: Removes all the containers and objects it contains by removing the account that is marked for deletion.

Api

Swift provides an HTTP-based REST service interface through Proxy Server to CRUD operations on accounts, containers, and objects. Before accessing the Swift service, you need to obtain an access token through the authentication service and then add the header information X-auth-token to the request sent. The following is an example of a list of containers in the request return account:

Get/v1/<account> Http/1.1host:storage.swift.comx-auth-token: The EAAAFD18-0FED-4B3A-81B4-663C99EC1CBB response header information contains the status code 200, and the list of containers is contained in the response body: http/1.1 Okdate:thu, Jan 2013 18:57:07 Gmtserver:apachecontent-type:text/plain; Charset=utf-8content-length:32imagesmoviesdocumentsbackups

All operations supported by Swift can be summarized in table 1:

Table 1. Swift RESTful API Summary TD style= "Margin:0px;padding:3px;border-color: #C0C0C0; border-collapse:collapse;" > Create, update, or copy objects
url get PUT
ledger User /account/ get container list - - - Get account meta data
/account/container get list of objects Create container update container metadata Delete container get container metadata
Yes. Like /account/container/ Object Get object content and metadata Update object metadata Get object metadata

The detailed API specification can be consulted in the developer's Guide. Application development can be implemented using Python bindings already contained in the SWIFT project itself, and in other programming languages, you can refer to the Rackspace compatible Swift Cloud Files API to support language bindings such as java,.net,ruby,php.


Introduction to Openstack Swift Principles, architecture and APIs

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.