HBase-based time series database (improved)

Source: Internet
Author: User
Tags zookeeper

Basic knowledge:

Expect: 1. Use efficient row and column keys to organize data storage and use smooth data persistence strategies to relieve cluster pressure

2. Ensure data consistency with zookeeper (election leader)

Technologies to improve performance: Data compression, indexing technology, and manifested views

Zookeeper Monitor Hregionserver, save root region actual address, hmaster Physical address, alleviate the burden of distributed application developing collaboration service from scratch

Hmaster managing Hregionserver Load Balancing

Logs based on Hadoop's sequencefile storage

HBase primarily handles actual data files and log files

Hregionserver submits the request to hregion processing

Merging and splitting of underlying storage files:

1. Memory files (Menstore) constantly swipe in, the number of files increased to a certain number of compaction threads will be merged into large files

2. Large file volume reaches the threshold of segmentation, triggering regions segmentation in Hregionserver

Architecture:

Data is uploaded by the IoT Sensor device group to the real-time historical database server farm, compressed, cached and written to the HBase server cluster

Zookeeper server group to the IoT Sensor device group for device registration management, real-time historical database server group process monitoring, the HBase server Cluster service cluster

Persistence policy:

In layman's words, raw data is processed by real-time data cache preprocessing, discarding processing of unqualified or time-disordered data, using historical data caching and compression, and finally depositing into hbase

The data cache pool is divided into two blocks of the same size, and a lossy compression thread pool is used to write the fixed position of one piece, reaching a piece of receiving, a piece of hbase writing, the writing method is divided into timed brush-in and threshold refresh

Thinking:

1.Flush and compaction optimization

Improvements to the 2.Split mechanism

database table with narrow table, row key is designed to combine row keys, that is [Tag_name][data_timestamp]

Data collation using the mechanism of offline finishing, using MapReduce to the row key in the tag_name and time range of the same merge into a row, the disadvantage is the cost and resource consumption

Zookeeper:1. Device confirmation (Registers a data node, stores information for all acquisition points to access physical node location information)

2. The application layer opens the query transaction function, first decomposes the query transaction, obtains the location information, the query request routes to the correct physical node

3. Device index changes, will be changed to submit, complete their own maintenance

Data statistics and Analysis module:

such as the one-time sliding averaging method

Predict the amount of data written in the next cycle, compare it to the current write volume, and defer the split request if a downtrend or forecast file is small

Query middleware:

1. Interpreting the row keys for data items

2. The query request is parsed, the results are merged and encapsulated

Simultaneous caching of index information required for applications with large demand for real-time queries

HBase-based time series database (improved)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.