Data Distribution in the Aerospike-Architecture Series
Data Distribution (Data Distribution)
The Aerospike database isShared-NothingArchitecture: each node in an Aerospike cluster is the same. All nodes are equal and have no spof.
Using the Aerospike smart partitioning algorithm, data is distributed across nodes in the cluster. We have tested our method in many cases in this field. This random number function ensures that the Partition Distribution error is 1-2%.
To determine the record direction, the RIPEMD160 algorithm is used to hash the record key (key) of any length into a 20-Bit fixed-length string. The first 12 digits constitute the partition ID, used to determine which partition contains this record. Partitions are also distributed in cluster nodes. Therefore, the cluster has N nodes, each of which stores 1/N of data.
Because the data is evenly and randomly distributed on nodes, there will be no hot spots or bottlenecks. It does not appear that a node processes more requests than other nodes.
For example, many American surnames start with "R. If data is stored in alphabetical order, the communication volume of servers whose names start with R is much larger than that of servers whose names start with X, Y, or Z. Random data allocation ensures server load balancing.
For reliability, Aerospike copies partitions on one or more nodes. One node acts as the master node for reading and writing partitions, and other nodes store copies.
For example, in a 4-node Aerospike cluster, each node is about the master node with 1/4 data and a copy of 1/4 data. Data partitions of the master node are distributed on all other nodes that act as copies. Therefore, if node #1 is inaccessible, the copy of node #1 will be extended to the other three nodes.
Replication factor is a configuration parameter and cannot exceed the number of cluster nodes. The more replicas, the higher the reliability. However, the higher the write request that must pass through all data copies. In practice, most deployments use two data factors (one master data and one copy ).
Synchronous replication ensures Real-Time Consistency without data loss. Before committing data and returning results to the client, write transactions are propagated to all copies. In some cases, when the Aerospike Smart terminal sends a request to those transient out-of-date error nodes during cluster reconfiguration, the Aerospike smart cluster transparently requests the request to the correct node. Finally, when the cluster is recovering from the partition, it solves all conflicts between different copies. Resolution can be configured as automatic. In this case, data with the latest timestamp is considered as standard. Or, to resolve data at a higher level, all data copies can be returned to the application.
How Aerospike Creates Partitions (How Aerospike Creates Partitions)
Namespace is a collection of data stored in the same way in the Aerospike database. Each namespace is divided into 4096 partitions and evenly distributed to nodes in the cluster. This means that if the cluster has n nodes, each node stores 1/n of data.
Use a random number hash to ensure even distribution of partitions. We have tested our methods in many cases in this field, with the data distribution error ranging from 1 to 2%.
Because the data is evenly and randomly distributed on nodes, there will be no hot spots or bottlenecks. It does not appear that a node processes more requests than other nodes.
For example, many American surnames start with "R. If data is stored in alphabetical order, the communication volume of servers whose names start with R is much larger than that of servers whose names start with X, Y, or Z. Random data allocation ensures server load balancing.
Next, you do not need to perform manual sharding. Evenly divided partitions among cluster nodes. The client detects cluster changes and sends requests to the correct node. The specified node is added or removed, and the cluster is automatically rebalanced. All nodes in the cluster are equal-the failure of no separate master node causes the entire database to go down.
When the database creates a record, the hash value of the record key is used to distribute the record to a partition. The hash algorithm is correct-the hash algorithm always maps the record to the same partition. It resides on the same node throughout the lifecycle of the record. A partition may be moved from one node to another. However, partitions are not split or redistributed to other partitions.
Each node in the cluster has a configuration file. The namespace configuration parameters on each node must be consistent. How Data is Replicated/Synchronized locally (How Data is locally Replicated/Synchronized) An Aerospike Cluster with No Replication (a non-Replicated Aerospike Cluster)
Consider the situation of 4-node clusters. In the Aerospike database, if there is no replication data, you must set replication factor to 1 (replication factor = 1), which means that there is only one copy in the database.
Because all 4096 partitions are in a 4-node cluster, each node has 1/4 of the data-1024 randomly allocated partitions. The cluster looks like this. Each node manages a partition set (for simplicity, only the partitions of two nodes are displayed ):
Each node is the primary data node with 1/4 data partitions-if the node is the primary read/write source of data, it is the primary data node.
The client has the ability to sense the location of data-the client knows the location of each partition-index data can be returned from a single hop from the node. Each read/write request is sent to the master data node for processing. When a smart node reads a record, it sends a request to the recorded primary data node.
An Aerospike Cluster with Replication (An Aerospike Cluster with Replication)
Now let's take a look at the situation with data replication. In most cases, two data copies, primary data and replica are maintained. In the Aerospike database, you must specify replication factor 2 (replication factor = 2 ).
In this example, each node has 1/4 primary data (1024 partitions) and 1/4 data copies (1024 partitions ). It looks like this (for simplicity, the details of the two nodes are displayed)
Note that the primary data of each node is distributed to all other nodes as copies. For example, the primary data partition copy of node #1 is distributed across other nodes. When node #1 is unavailable, the data copy of node #1 is extended across other data nodes.
The client sends a request to the primary data, as in the example of No-copy.
As with the absence of replication, read requests are sent to the correct node through the Smart Client, and write requests are also sent to the correct node. When a node receives a write request, it saves the data and forwards the write request to the replica node. Once the replica node confirms that the data is successfully written and the master data node itself completes the write operation, it then confirms that the data is sent to the client and the write operation is successful.
The replication factor cannot exceed the number of nodes in the cluster. The more replicas, the more reliable it is, the higher the requirement for write requests to traverse all replicas. In practice, most Database Replication factors are 2. Automatic data rebalancing (Automatic data rebalancing)
Unmanned parts.
The Aerospike data rebalancing algorithm ensures that the request volume is evenly distributed across all nodes. When the node fails during rebalancing, the algorithm is still robust. The system is designed to be continuously available, and the rebalancing of all data does not affect cluster behavior. The transaction algorithm is integrated with the data distribution system, with only one consistent vote to coordinate cluster changes. When the client discovers a new cluster configuration, there is only one small interval using the intra-cluster redirection algorithm. In this way, a scalable non-shared mechanism is used to optimize the simple transaction environment while preserving ACID features.
Aerospike allows configuration options to specify how many available operation overhead should be used to manage tasks, for example, how much is used to rebalance between nodes compared to running client transactions. The cluster recovers faster when the preferred transaction slows down. When the transaction volume and speed must be maintained, the cluster rebalancing will be slow.
When some cluster factors cannot be met. The cluster can be configured to reduce the configuration factor to keep all data, or to clear data marked as discarded. If the cluster does not accept more data, the cluster will operate in read-only mode until the new expansion is available-the node will automatically change to be able to accept the write operation of the application.
Without operator intervention, the cluster will be self-healing even within the required time. In customer deployment, extracting one of the eight-node clusters will interrupt the entire loop. This requires no manual intervention. Even if the data center goes down during peak hours, the transaction remains accurate with ACID. When an error is fixed within several hours, the operator does not need to perform any special steps to maintain the cluster.
Our expansion plan and System Monitoring provide you with the ability to handle unforeseen errors without service loss. You can configure hardware capacity and replication/synchronization policies so that database recovery will not affect users.
Handling Traffic Saturation (processing Traffic)
The details about how network hardware handles peak traffic loads are beyond the scope of this document. The Aerospike database provides monitoring tools to evaluate bottlenecks. If the network is a bottleneck, the database will not run at full capacity, and the request will slow down.
Handling Capacity Overflows (processing Capacity overflow)
We have many suggestions for capacity planning. We manage storage and monitor clusters to ensure that storage does not overflow. However, in case of storage overflow, Aerospike triggers the stop write limit-in this case, no new records will be accepted. However, data modification and reading are processed normally.
In other words, even if the capacity is exceeded, the database will not stop processing queries, and it will continue to process as many user requests as possible.