Apache Accumulo User Manual

Last Update:2014-09-29 Source: Internet

Author: User

Tags accumulo

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Apache Accumulo User Manual

1. Introduction

Apache Accumulo is a highly scalable Structured Storage Based on Google's BigTable. Accumulo is written in Java and stored in Hadoop Distributed File System (HDFS). This is part of the popular Apache Hadoop project. Accumulo supports efficient storage and retrieval of structured data, including query ranges, and provides MapReduce jobs that support using Accumulo tables as input and output.

Accumulo provides automatic load balancing and partitioning, data compression, and fine-grained Security labels.

2. Accumulo Design

2.1. Data Model

Accumulo provides rich data models that are not simple key-value storage, but not completely relational databases. Data is represented as a key-value pair, and the key and value are composed of the following elements:

Key					Value
Row ID	Column			Timestamp
	Family	Qualifier	Visibility

All the elements of the key and value are expressed as the timestamp of the byte array. This is a long-term exclusion. The elements and dictionaries of various Accumulo keys are sorted in ascending order. Timestamps are sorted in descending order and first appear in a continuous scan for later versions of the same key. A table is composed of a group of sort key-value pairs.

2.2. Architecture

Accumulo is a distributed data storage and retrieval system. Therefore, it is run on several building components, many of which are independent servers. Most of the work does involve data that Accumulo maintains certain performance, such as organization, availability and integrity, on many product-level machines.

2.3. Components

The Accumulo of an instance includes many TabletServers, one master server and multiple clients during the garbage collection process.

2.3.1. Tablet Server

TabletServer manages all the slices (partition tables) of some subsets ). This includes receiving written logs from the client, writing new key-value pairs to sort in memory, and regularly flushing new files in the SORT key-value pairs HDFS, and respond to the read from the client to form a merge sort to view all the keys and values stored from the created and sorted memory.

TabletServers failed to restore the previous server and re-applied it to find any tablet written in the pre-written log.

2.3.2. Garbage Collector

The Accumulo process shares files stored in HDFS. At intervals, the garbage collector determines that no process files are needed and deletes them.

2.3.3. Master

Is the Accumulo master responsible for detecting and responding to TabletServer failures. It tries to balance the Load Distribution of pills carefully and guides TabletServers to detach tablets, if necessary, across TabletServer. The master ensures that all parts are allocated to each TabletServer, and processes the table creation, modification, and deletion requests from the customer. The master node is also started in coordination. It fails to shut down normally and resume the change of the pre-write log tablet server.

Multiple masters may run. The master selects one as the master and the others as the backup.

2.3.4. Customer

Accumulo includes the client libraries linked to each application. The client library contains logic to manage a specific tablet, server and communication with TabletServers to write and retrieve key-value pairs.

2.4. Data Management

Accumulo stores the data in the table, which is divided into slices. Partition row boundary, so that all columns and values of a specific row are found in the same tablet together. The time when the primary slice is allocated to a TabletServer. This prevents row-level transactions from using distributed locks or other complex synchronization mechanisms. As the customer inserts and queries data, the machine adds and deletes data from the cluster, and the master migration chip ensures that they can still collect and query the load balancing cluster.

2.5. Tablet Service

When the TabletServer arrives, it is written into a pre-written log, and then inserted into memory a sorted data structure called MemTable. When MemTable writes to TabletServer of a certain scale, the files of the sort key-value pairs in HDFS are called index sequential access method (ISAM) files. This process is called a small compaction. Then create a new MemTable with pre-written log records and compacted facts.

When a request reads data on the TabletServer, The TabletServer performs a binary search and finds the relevant indicators of each ISAM file in the memory of the entire MemTable. If the client performs a scan, several key-value pairs are returned to the client to merge and sort the ISAM file sets in MemTable because they are reading.

2.6. Compaction

In order to manage the number of files per piece, TabletServer is executed regularly to compress the files in the tablet. Some ISAM files are merged into one file. The previous files will be deleted by the garbage collector. This also provides an opportunity to permanently delete key-value pairs. When a key-value pair is omitted, it is used to suppress the deletion of entries when creating a new file.

2.7. Split

When you create a table, it has a tablet. As the table grows, its initial tablet eventually splits into two pieces. It may be because these pills will be migrated to another tablet server. As shown in the table above, the table continues to grow and its tablets will continue to be split and migrated. The decision is to automatically split the tablet size file based on the tablet. The size threshold for tablet segmentation is configured for each table. In addition to automatic split, you can manually add Split points to create new tablets from the table. Manual splitting of a new table allows concurrent read and write operations to provide better initial performance without waiting for automatic splitting.

When the data is deleted from the table, the tablet may shrink. Over time, this may lead to null or small pieces. To solve this problem, tablet merging is introduced in 1.4 Accumulo. This will be discussed in more detail later.

2.8. Fault Tolerance

If the TabletServer fails, the mage detects and automatically allocates fragments from the faulty server to other servers. If any key-Value Pair fails during the time in the memory, TabletServer automatically re-applies the pre-written log to prevent any data loss.

The mage copies the coordinated prewrite logs to HDFS to provide logs to all tablet servers. To improve recycling efficiency, update the tablet in the log group. Given their locations, they are now assigned to the sorted logs, and can quickly apply for TabletServers mutations.

If the TabletServer fails, you will notice that the master's monitoring page is accessed through http: // master-address: 50095/monitor.

Image/failure_handling.png

For more details, please continue to read the highlights on the next page:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More