Hummertimeseries DB Technical Architecture introduction, hummertimeseries
Hummer Timeseries DB system architecture has the following main features:
- A. the Hummer Timeseries DB system adopts A layered modular design. The layers and modules are loosely coupled. The layered structure facilitates independent evolution of each functional layer and easy code maintenance. In addition, you can choose between the layers based on the scenario requirements when using the deployment.
- B. interconnection with excellent open-source software not only avoids the trouble of recreating the wheel (for example, using impala as the parallel query layer and zabbix as the system monitoring layer, it also helps us to integrate into the mainstream technology environment.
The Hummer Timeseries DB system consists of four logical layers (subsystems:
- Data storage layer (HummerStore): the data storage system is a key-value-based distributed storage system. It is responsible for Distributed Data Storage, query, and metadata management. It contains the main components Zookeeper, Master, and Node.
- SQL parallel query layer (Impala): the SQL query system is responsible for executing SQL parsing and execution plans. Here we use the interpretation of impala, an open-source SQL query and Analysis System (fork) system-transform it for a specific purpose (replace the default hdfs with hummerdb for data storage, optimize time series query and analysis ). It includes the main components Statestored, catalogd, impalad
- Offline computing layer (MR/MR2): The offline computing system uses the Hadoop Map/Reduce computing framework to implement the data operation interface (InputFormat/OutFormat For HummerStore) required by MR/MR2 ). It includes JobTracker, TaskTracker/ResourceManager, NodeManger
- Supervision layer (Manager): The Supervision layer is responsible for monitoring the full lifecycle activities of machines and service instances, and reporting the necessary performance count-The zabbix monitoring system is used for machine and service instance monitoring; the Console UI is also provided for administrators. It includes the main components zabbix agent/server and Console Manager.
Main module function introduction and technical introduction: data storage layer: Master is responsible for the following tasks:
- Maintenance System Original data: Metadata includes-table information [Time Series table/object table, unique Key/non-unique key, column data type], partition information [machine location, data path], etc, the original data is stored in mysql.
- Data copy Consistency Maintenance: The master is responsible for maintaining the partnership of data copies (such as election of the master copy, addition and withdrawal of the copy group ). The specific operation needs to work with Node. The consistency algorithm is similar to the zab protocol used by zk.
- Server Load balancer and fault recovery: the master node is responsible for data migration, expansion, and contraction scheduling.
- Maintenance operation interface: management commands are uniformly received and processed by the master (such as table creation, table deletion, partition migration, and Server Load balancer)
Master architecture:
- Master HA: active-standby relationships between multiple Master nodes (one active and the rest standby). Because the Master instance is a stateless Service (all information is stored in mysql ), when the active master fails, real-time switchover can be performed. Status switching relies on zookeeper. That is, when zookeeper discovers an active master fault, it selects one from the standby master instance to be upgraded to the active master, continue to provide external services.
- Mysql HA: Circular binlog replication is used between multiple Mysql instances. When the primary Mysql instance fails, the above active Master will be responsible for promoting the Master from the mysql instance and switching access to the new Master.
Node nodes are responsible
- Write Data multi-copy synchronization: the node is responsible for receiving data written to the shard, and implements data synchronization between multiple replicas Based on the shard partnership.
- Responsible for Data Reading and data scanning: node should also be responsible for random queries based on precise keys, or batch scans by range.
- Responsible for persistent data storage: The persistence storage engine uses the Leveldb with special modifications -- remove WAL (the node already has the WAL mechanism) implement time series-oriented sorting algorithms, implement interval deletion in merge, introduce LZ4 compression algorithms, split disks to implement merge threads, and modify SST file formats to optimize statistics.
- Sharded election: works with the Master node to achieve sharded Master-slave election
- Data Migration: works with the Master scheduling command to achieve online data sharding migration.
Node architecture
- Node adopts the Share Nothing architecture, and each node has no sharing status. Therefore, it has high scalability and high performance.
- HA: the Node manages Multiple shards, while the HA and consistency of the shards are ensured by the ZAB protocol.
Zookeeper is responsible
- Monitor whether the node instance is online: If the node instance is faulty, the master will be notified and the corresponding Shard is executed to select the master.
- Monitor whether the master instance is online: if an active master instance is faulty, the standby master is awakened and upgraded to active master.
- Records a small number of status variables (such as migration execution plans and execution progress snapshots)
- Maintain the Logical Address: The ing between the logical address of the master and the physical address is recorded in zookeeper. The ing relationship is updated during master switching. In this way, you can find the actual physical address based on the Logical Address.
SQL query Layer
- Statestored monitors the status of service instances (similar to zk)
- Catalogd synchronizes original data (table information)
- Impalad is responsible for SQL parallel query
Our main transformations include:
- Catalogd: to obtain the original table data, you must use the Master of HummerDB to obtain the table shard location information and inform impalad
- Impalad: In the execution plan, the time and key-related predicates are pushed down to the Node's scan operation (due to HummerDB's clustered sorting For the time series, the given predicates are pushed down to the bottom layer of the Data scan, undoubtedly bring the highest efficiency)
- Impalad: implements the physical execution plan based on the number of shards and the number of shards. The scan node is allocated by the number of shards, And the scan node is preferentially distributed to the machine on which the shard copy is located for "local" execution)
Offline computing layer: JobTracker, TaskTracker/ResourceManager, NodeManger
We mainly implement the corresponding HummerInputFormat and HummerOutputFormat interfaces.
Regulatory layer:
- Zabbix agent: Collects the running statuses of physical machines and service instances.
- Zabbix server: Collects and records metrics
- Console manager: Web Service that provides interactive O & M operations
Hummer TimeSeries DB Dock DEMO Introduction Article and download see http://www.xushiiot.com/blog-index.html#/content-reply/97