Data distribution of the Aerospike-architecture series

Source: Internet
Author: User

Data distribution (distribution)

The Aerospike database is shared-nothing Schema: Each node in a aerospike cluster is the same, all nodes are peer, no single point of failure.

Using the Aerospike Intelligent partitioning algorithm, the data is distributed over each node in the cluster. We have already tested our approach in many cases in this field, and this very random number function guarantees that the partition distribution error is 1-2%.

To determine the whereabouts of the record, using the RIPEMD160 algorithm, the record key (key) of any length is hashed to a 20-bit fixed-length string, and the first 12 bits form the partition ID, which determines which partition contains the record. Partitions are also distributed across cluster nodes. Therefore, there are N nodes in the cluster, and each node stores approximately 1/n of data.

Because data is distributed evenly and randomly across nodes, hotspots and bottlenecks do not occur. There is also no obvious case that a node is processing more requests than other nodes.

For example, many surnames in the United States begin with R. If the data is stored alphabetically, the server that stores the start of R will have a much larger traffic than the server that stores the beginning of x, Y, or Z. Random data allocation ensures server load balancing.

For reliability, Aerospike replicates partitions on one or more nodes. One node acts as the master node for the read-write partition, and the other nodes store replicas.

For example, in a 4-node Aerospike cluster, each node is approximately 1/4 of the primary node of data and a copy of 1/4 data. The data partition that is the primary node, distributed across all other nodes that act as replicas. So, if node # # is inaccessible, a copy of Node # # will be extended to the other 3 nodes.



Replication Factor ( Replication Factor) is a configuration parameter and cannot exceed the number of cluster nodes. The more replicas you have, the higher the reliability. However, the higher the write request must go through all copies of the data. In practice, most deployments use a data factor of 2 (one master data and one copy).

Synchronous replication guarantees instant consistency and no data loss. The write transaction is propagated to all replicas before the data is submitted and the results are returned to the client. In individual cases, during cluster reconfiguration, when the Aerospike Smart Terminal sends requests to those briefly outdated error nodes, the Aerospike smart cluster will transparently request the proxy to the correct node. Finally, when the cluster is recovering from the partition, it resolves all conflicts that occur between different replicas. Parsing can be configured to be automatic, in which case the data with the latest timestamp is considered standard. Or, in order to parse at a higher level, all copies of the data can be returned to the application.

How Aerospike creates partitions (Aerospike how to create a partition)

Namespace is a collection of data that the Aerospike database stores in the same way. Each namespace is divided into 4,096 partitions, which are partitioned equally into nodes in the cluster. means that if there are N nodes in the cluster, each node stores approximately 1/n of data

A very random number hash method is used to ensure that the partition is evenly distributed. We have tested our approach in many cases in this field, where the data distribution error is 1-2%.

Because data is distributed evenly and randomly across nodes, hotspots and bottlenecks do not occur. There is also no obvious case that a node is processing more requests than other nodes.

For example, many surnames in the United States begin with R. If the data is stored alphabetically, the server that stores the start of R will have a much larger traffic than the server that stores the beginning of x, Y, or Z. Random data allocation ensures server load balancing.

Next, no artificial shards are required. Partitioned evenly between cluster nodes. The client discovers the cluster changes and sends the request to the correct node. Nodes are added or removed, and the cluster is automatically rebalanced. All nodes in the cluster are equal-there is no separate master node failure that causes the entire database to be down.

When the database is create a record, the hash value of the key is used to assign the record to a partition, and the hashing algorithm is deterministic-the hashing algorithm always maps the record to the same partition. It resides on the same node throughout the life cycle of the record. Partitions may be moved from one node to another node. partitions do not split or redistribute records to other partitions

Each node in the cluster has a configuration file. The namespace configuration parameters must be consistent on each node. How the data is replicated/synchronized locally (the copy/sync of the information) an aerospike Cluster with no Replication (a aerospike cluster without replication)

Consider the case of a 4-node cluster. In the Aerospike database, no replicated data is required to set the replication factor to 1 (replication factor = 1), meaning that there is only one copy in the database.

Because all 4,096 partitions are in a 4-node cluster, each node has 1/4 of the data-a randomly allocated 1024 partition. Clusters look like, each node manages a collection of partitions (for simplicity, only two-node partitions are shown):


Each node is the primary data node of the 1/4 data partition-If the node is the primary read-write source of the data, it is the primary data node.

The client has location-aware data-the client knows the location of each partition-the index data can be returned from the node single-hop. Each read-write request is sent to the master data node for processing. When the smart node reads a record, it sends a request to the primary data node of the record.

An aerospike Cluster with Replication (aerospike cluster with replication)

Now consider the case with data replication. In most cases, two copies of data are maintained, master data and replicas. In the Aerospike database, you need to specify a replication factor of 2 (replication factor = 2).

In this example, each node has 1/4 of the master data (1024 partitions) and 1/4 copies of the data (1024 partitions). It looks like this (for simplicity, show details of two nodes)


Note that the master data for each node is distributed to all other nodes as replicas. For example, the master data partition copy of Node # # is distributed across other nodes. When node # # is not available, the data copy of Node # # extends across other data nodes.

As with the previously mentioned example of no replicas, the client sends the request to the master data.

As with no replication, the read request is sent through the smart client to the correct node, and the write request is sent to the correct node. When a node receives a write request, it saves the data and forwards the write request to the replica node. Once the replica node confirms that the data is written successfully and the master data node itself completes the write action, and then confirms that it is sent to the client, the write operation succeeds.

The replication factor cannot exceed the number of nodes in the cluster. More replicas are more reliable, but the need for write requests to traverse all replicas is higher. In practice, most database replication factors are 2. Automatic Data rebalancing (auto-rebalance)

No artificial shards.

The Aerospike data rebalance algorithm ensures that the request volume is evenly distributed across all nodes, and the algorithm remains robust in the event of a node failure during the rebalance. The system is designed to be continuously available, and all data rebalancing does not affect cluster behavior. Transaction algorithms integrate with the data distribution system, with only one unanimous vote to coordinate cluster changes. When the client discovers a new cluster configuration, the cluster internal redirection algorithm is used, with only a small interval. This optimizes the transactional simple environment in a scalable, no-sharing mechanism, while preserving the acid characteristics.

The aerospike allows configuration options to specify how much of the available operational overhead should be used for administrative tasks, such as how much is used for inter-node rebalancing compared to running client transactions. Clusters heal faster in the case of a slowdown in preferred transactions. Cluster rebalancing can be slow in situations where the volume and speed must be maintained.

In cases where some cluster factors are not sufficient. A cluster can be configured to reduce the configuration factor to keep all data, or to clear which tokens are discarded on the data. If the cluster does not accept more data, the cluster will operate in read-only mode until the new expansion is available-the node will automatically become a write operation that can accept the application.

No operator intervention is required, even within the required time, the cluster will heal itself. In a customer deployment, extracting one of the 8-node clusters causes the entire loop to be interrupted. This requires no human intervention. Even when the data center is down at peak time, the transaction remains accurate acid. Within a few hours, when an error is fixed, the operator does not need to perform special steps to maintain the cluster.

Our expansion plans and system monitoring provide you with the ability to handle unforeseen errors, and no service loss. You can configure hardware capacity and set up replication/synchronization policies so that database recovery can have no impact on users.

Handling traffic saturation (handling traffic)

The details of how network hardware handles peak traffic loads are beyond the scope of this document. The Aerospike database provides a monitoring tool to assess bottlenecks. If the network is a bottleneck, the database does not run at full capacity and requests become slower.

Handling capacity overflows (processing capacity overflow)

We have many recommendations for capacity planning, managing storage and monitoring the cluster to ensure that storage does not overflow. However, in the case of a storage overflow, aerospike triggers the stop write limit-in which case no new records are accepted. However, data modification and reading are handled normally.

In other words, even if you exceed the capacity, the database does not stop processing the query operations, it continues to maintain as many user request processing volume.


original link: >
Translator: Beijing It man son
 

Data distribution of the Aerospike-architecture series

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.