Aerospike-Architecture-series hybrid storage, aerospike

Source: Internet
Author: User
Tags aerospike

Aerospike-Architecture-series hybrid storage, aerospike
Hybrid Storage (Hybrid Storage)

The hybrid memory system contains indexes and data on each node to manipulate interaction with physical storage. It also includes modules for Automatic Data removal and data fragmentation.

Aerospike can store data in DRAM, traditional disks, and SSD hard disks. Each namespace can be configured separately. This configuration elasticity allows application developers to configure a small but frequently accessed namespace in the memory, and configure a large namespace in a relatively inexpensive SSD hard disk.

The optimization of data storage on SSD has been completed, including penetrating the file system using the underlying SSD read/write mode.

Philosophy (concept)

Unlike Large Data Types, all Data of a record is stored together. The storage limit for each row is 1 MB by default.

Copy-on-write (copy-on-write), which is recycled by the fragment process.

A fixed storage size is configured for each namespace. Each node must have the same namespace and each namespace must be of the same size.

Storage can be configured as pure memory without persistence, memory is persistent, or flash memory (SSD)

Persistent storage (Disk) must be a flash or high-performance block storage device (cloud), or a file on any storage device.

Data in DRAM (Data in memory)

Data in memory-No persistence-the advantage is high throughput. Even if the performance of high-performance modern flash storage is still inferior to that of memory, the memory price is also decreasing rapidly.

Data passJEMallocAllocate a distributor. JEMalloc can be allocated to different pools. Long-term allocation-for example, those for the storage layer-can be allocated separately. We found that the JEMalloc has extraordinary performance in the case of low fragmentation.

Advanced reliability can be achieved through multi-copy of DRAM. Because Aerospike re-allocates data and copies data when the cluster node is damaged or the node is added, the advanced "k-safety" can be obtained ". The node data is automatically restored from the data copy.

Due to the random data distribution of Aerospike, the risk of data unavailability is quite small when several nodes fail. For example, if two nodes fail in a 10-node cluster with two copies of data. The number of invalid data is about 2% and 1/50.

When the persistent layer is configured, read occurs in the memory copy. Write data through the data path.


Data on SSD/Flash (Data on SSD)

When data is written, write latencies are added to the row to avoid conflict with the same records. In some cluster states, data needs to be read from other nodes and conflict is resolved.

When the write is confirmed, the records in the memory are updated on the master node. Add the written data to the write buffer. If the write buffer is full, data is written to the disk in queue. Similar to the maximum number of rows, it depends on the write buffer size and write throughput. There are some risks of uncommitted data.

If there are copies, their indexes will be updated at the same time when they are updated. When multiple copies in the memory are updated, the result is returned to the client.

The system can be configured to return results-latency consistency before all writes are completed.

Storing data)

Aerospike data includes integer, String, binary object, native serialization type, list, ing, and LDTs.

In addition to the more efficient "single bin mode", bin-Aerospike columns-each bin has a bin name, which is stored in a string table. The name of each column is stored and removed. a namespace can store 32 K unique bin names.

If you need more bin names, you can use map. With map, you can store any number of key-value pairs and access data efficiently through udfs.

If you access data through a complex language type such as java class. The Aerospike client will use a language-native serialization system. Data is stored as the blob type specified by the language ". This allows clients in the same language to read data with clear code, but the default serialization in most languages is very simple.

Integer data is stored in 8 bytes, which limits the value of integer data of the current version. The Aerospike network protocol allows variable-length integers.

String is stored in the UTF-8 character set. UTF-8 for most strings is more compact than unicode. To allow coqueue language compatibility, the client function library converts original unicode characters to UTF-8.

The most efficient way is to use a binary object (blobs ). Its size is limited to the record size limit. Many deployments use their own serialization and may store objects directly after compression. This means that data cannot be easily accessed through udfs.

Complex types are displayed as msgpack local storage. Complex objects are serialized on the client and sent using the write protocol. When the application is applied to a simple get/put operation, the network format does not need to be serialized or converted, and is directly written to the storage.

Flash optimizations (Flash optimization)


The fragment splitter tracks the number of activity records on each block on the disk and recycles less than the minimum used block. The splitters constantly scan for fast activities and find blocks with a certain amount of free space.

Eviction based on storage (storage-based recycling)

The fragment splitter tracks the number of activity records on each block on the disk and recycles less than the minimum used block. The eliminator is responsible for removing expired records and revoking memory when the system reaches the preset high level line. When namespace is configured, the Administrator specifies the maximum memory usage of the namespace. In general, the cleaner looks for expired data and releases memory and disk space. The cleaner also uses namespace to track memory usage. If the memory reaches the preset high level line, the cleaner will release old records even if the record does not have to expire. When the system memory reaches the upper limit, Aerospike can be used as an efficient LRU (least recently used algorithm) cache by allowing the cleaner to remove old data. Note that the record age is measured by the last modification time, and the application can modify the record retention period at any time. The application can also specify that records will never be automatically recycled.

Large records (Sub-Record Storage Mechanism) Large Record (Sub-Record Storage algorithm)

To support the storage capability of large objects, Aerospike supports the new underlying storage model, which is called sub-records ). Sub-records is similar to conventional records, but cannot be accessed directly due to the main difference. The child record is linked to the parent record and accessed through the parent record. The sub-record shares the partition ID and internal record lock with the parent record. Therefore, the sub-record moves with the parent record during migration and is protected together with the parent record through the same isolation mechanism.

Aerospike LDT is built using this storage algorithm. LDT bins/records (records with LDT bin type) is not a continuous storage of related records, but is divided into multiple sub-records (the size is between 2 K and 1 M ). A sub-record is related to a bin and can contain multiple items (for example, an 8 K sub-record can hold 100 strings of 80 bytes ). The sub-records are interconnected, and the links provide effective updates and searches for the parent records.

Therefore, LDT objects use Aerospike's robust replication, rebalancing, and migration algorithms to ensure real-time consistency and high availability. LDT objects are processed on the database server through client APIs.

The sub-record mechanism has the following benefits:

  • Using SSD to execute random reads does not require any other overhead, nor does it need to worry about the data arrangement on storage during the implementation of traditional databases.
  • When you perform a specified LDT operation (for example, insert 100 bytes), the cost is only equivalent to updating the LDT entry, and there is no interaction cost between the client and the server.

Large records are stored in different ways, but the Data allowed exceeds the limit of a single record. Please refer to Large Data Type Architecture.


<Http://www.aerospike.com/docs/architecture/storage.html>
Translated by: Beijing IT masters

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.