HBase-based time series database (improved)

Last Update:2016-11-03 Source: Internet

Author: User

Tags zookeeper

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Basic knowledge:

Expect: 1. Use efficient row and column keys to organize data storage and use smooth data persistence strategies to relieve cluster pressure

2. Ensure data consistency with zookeeper (election leader)

Technologies to improve performance: Data compression, indexing technology, and manifested views

Zookeeper Monitor Hregionserver, save root region actual address, hmaster Physical address, alleviate the burden of distributed application developing collaboration service from scratch

Hmaster managing Hregionserver Load Balancing

Logs based on Hadoop's sequencefile storage

HBase primarily handles actual data files and log files

Hregionserver submits the request to hregion processing

Merging and splitting of underlying storage files:

1. Memory files (Menstore) constantly swipe in, the number of files increased to a certain number of compaction threads will be merged into large files

2. Large file volume reaches the threshold of segmentation, triggering regions segmentation in Hregionserver

Architecture:

Data is uploaded by the IoT Sensor device group to the real-time historical database server farm, compressed, cached and written to the HBase server cluster

Zookeeper server group to the IoT Sensor device group for device registration management, real-time historical database server group process monitoring, the HBase server Cluster service cluster

Persistence policy:

In layman's words, raw data is processed by real-time data cache preprocessing, discarding processing of unqualified or time-disordered data, using historical data caching and compression, and finally depositing into hbase

The data cache pool is divided into two blocks of the same size, and a lossy compression thread pool is used to write the fixed position of one piece, reaching a piece of receiving, a piece of hbase writing, the writing method is divided into timed brush-in and threshold refresh

Thinking:

1.Flush and compaction optimization

Improvements to the 2.Split mechanism

database table with narrow table, row key is designed to combine row keys, that is [Tag_name][data_timestamp]

Data collation using the mechanism of offline finishing, using MapReduce to the row key in the tag_name and time range of the same merge into a row, the disadvantage is the cost and resource consumption

Zookeeper:1. Device confirmation (Registers a data node, stores information for all acquisition points to access physical node location information)

2. The application layer opens the query transaction function, first decomposes the query transaction, obtains the location information, the query request routes to the correct physical node

3. Device index changes, will be changed to submit, complete their own maintenance

Data statistics and Analysis module:

such as the one-time sliding averaging method

Predict the amount of data written in the next cycle, compare it to the current write volume, and defer the split request if a downtrend or forecast file is small

Query middleware:

1. Interpreting the row keys for data items

2. The query request is parsed, the results are merged and encapsulated

Simultaneous caching of index information required for applications with large demand for real-time queries

HBase-based time series database (improved)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More