Druid Schema Description

Last Update:2018-07-25 Source: Internet

Author: User

Tags continue data structures zookeeper

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1, Druid Introduction
2. Druid Characteristics
3. Usage Scenarios
4, Realtimenode
5, Historicalnode
6, Brokernode
7, Coordinatornode
8. Introduction to Architecture
9. Distributed Cluster

1, Druid Introduction
Druid is a high-fault-tolerant, high-performance open-source distributed system for real-time query and analysis of Big data, designed to quickly process large-scale data and enable fast query and analysis.

Main Features:
1, Design for analysis --druid is built for exploratory analysis of OLAP workflows, supports a variety of filter, aggregator, and query types, and provides a framework for adding new features. Users have developed advanced K-query and histogram capabilities using Druid's infrastructure.
2, interactive query --druid's low latency data ingestion architecture allows events to be queried in milliseconds after they are created, because Druid's query latency is optimized by only reading and scanning the necessary elements. Aggregate and filter do not wait for results.
3, high availability --druid is used to support the implementation of SaaS that needs to be online all the time. Your data is still available and can be queried when the system is updated. Scale expansion and shrinking will not result in data loss.
4, scalable -existing druid deployments handle billions of of events and terabytes of data per day. Druid is designed to be PB-level.
The Druid feature is located between Powerdrill and Dremel in terms of the system. It implements almost all the tools provided by Dremel (Dremel handles arbitrary nested data structures, while Druid allows only one array-based nesting level) and absorbs some interesting data formats and compression methods from Powerdrill. The
Kylin provides user queries for data based on cube data that was built the day before the partition, and the user is querying historical data. Druid constantly pulls data from the ingest and builds the cube continuously, providing real-time queries

2. Druid Characteristics
Sub-second query: Druid provides fast aggregation capabilities and sub-second OLAP query capabilities, multi-tenant design is the ideal way for users to analyze applications
Real-time Data injection: Druid supports the injection of streaming data and provides event-driven data to ensure the effectiveness and consistency of events in real-time and offline environments
Scalable petabytes of storage: Druid clusters can be easily expanded to petabytes of data, millions data injection per second. Even with increased data size, it is possible to ensure the effectiveness of the
Multi-Environment deployment: The Druid can run on both commercial hardware and on the cloud. It can inject data from a variety of data systems, including Hadoop,spark,kafka,storm and Samza.
Rich community: Druid has a rich community for everyone to learn

3. Usage Scenarios
First: Suitable for cleaning the record real-time input, but do not need to update the operation
Second: Support wide tables without join (in other words, a single table)
Third: You can summarize the basic statistical indicators, you can use a field to represent
IV: High (even to minute) requirements for time zone and Time dimensions (year, month, week, day, hour, etc.)
Five: Real-time is important
Sixth: the sensitivity to data quality is not high
Seventh: For positioning effect analysis and policy decision reference

4, Realnode
Real-time nodes encapsulate the ability to import and query event data, and event data imported through these nodes can be queried immediately.
The real-time node only cares about the event data for a short period of time, and periodically imports the immutable event data collected during this time into the Druid cluster and another node dedicated to dealing with immutable bulk data.
Real-time nodes coordinate work through zookeeper coordination and other nodes of the Druid cluster. Real-time nodes use zookeeper to announce their online status and the data they provide

The live node maintains an in-memory index cache for all incoming event data, which incrementally increments as the event data is passed in, and these indexes can be queried immediately, querying the event data in the heap-based cache that is cached in the JVM, and druid behaves like a row store
In order to avoid heap overflow problems, the real-time node will persist the in-memory index to disk when it reaches the maximum set limit of the row at regular intervals.
This persistence process converts the data stored in the memory cache into a Columnstore-based format, and all persisted indexes are immutable, and the live node loads the indexes into off-heap memory so that they can continue to be queried
The above image real-time node caches event data onto an in-memory index and then regularly persists to disk. The persisted indexes are periodically merged together before the transfer. The query hits both in-memory and persisted indexes
All real-time nodes will periodically start the background of a scheduled task to search for a local persistent index, and the background planning task merges these persisted indexes together and generates an immutable piece of data that contains all the event data that has been imported by the real-time node for a period of time, and we call these data blocks " Segment ". In the delivery phase, the live node uploads these segment to a permanently persisted backup store, typically a distributed file system, such as S3 or Hdfs,druid called "Deep Storage".
Real-time node process: Import, persist, merge, and transfer these phases are all flowing, and no data is lost during these processing stages, as shown in the following diagram:

The node starts at 13:47 and only accepts event data for the current hour and the next hour. When the event data begins to be imported, the node announces that it serves the segment data for this time period from 13:00 to 14:00
Every 10 minutes (this interval is configurable), the node will flash the in-memory cache data to the disk to persist, at the end of the current hour, the node will be ready to receive the event data from 14:00 to 15:00, once this happens, the node will be ready for the next one hours to serve, And a new in-memory index is created.
The node then announces that it also provides a segment service for 14:00 to 15:00 this time. Instead of merging the persistent index from 13:00 to 14:00, the node waits for a configurable window time until the arrival of some deferred data from 13:00 to 14:00 for this period of time. The time of the window minimizes the data loss caused by the latency of the event data.
At the end of the window period, the node merges all persisted indexes from 13:00 to 14:00 this time period into a separate immutable segment, and sends the segment away, once the segment is loaded and queried elsewhere in the Druid cluster, The live node refreshes the data it collects for the period from 13:00 to 14:00, and announces the cancellation of the service for that data.

5, Historicalnode
Historical nodes encapsulate the ability to load and process immutable data blocks (segment) created by real-time nodes. In many real-world workflows, most of the data that is imported into the Druid cluster is immutable, so the historical node is typically the primary working component in the Druid cluster.
Historical nodes follow the shared-nothing architecture, so there is no single point of issue between nodes. The nodes are independent and the services provided are simple, they only need to know how to load, delete, and handle immutable segment (note: Shared nothing architecture is a distributed computing architecture in which there is no centralized storage state, There is no resource competition in the whole system, this architecture has very strong extensibility and is widely used in Web applications.
Similar to real-time nodes, historical nodes advertise their presence in zookeeper and provide services for which data. The instructions for loading and removing segment are published via zookeeper, which contains information about where segment is stored in deep storage and how to extract and process the segment.
Before the history node downloads a segment from deep storage, it checks the local cache information to see if segment already exists in the node, and if segment does not yet exist, the history node downloads storage from the deep segment to the local
Once the processing is complete, the segment will be advertised in the zookeeper. At this point, the segment can be queried. The local cache of the historical node also supports the fast update and restart of the history node, which checks its cache at boot time and provides immediate service for any data it finds, as shown below:

The historical node downloads the immutable segment from the deep storage. Segment must be loaded into memory before it can be queried
Historical nodes can support read consistency because they only handle immutable data. Immutable blocks of data support a simple parallel model: Historical nodes can simultaneously scan and aggregate immutable blocks in a non-blocking manner
Tiers: Historical nodes can be grouped into different tiers and which nodes are configurable in a tier. The goal of tier is to allocate high or low priority to the distribution of data based on how important the segment is.
You can configure different performance and fault tolerance parameters for different tiers. For example, a tier of "hotspot data" can be composed using a number of cores of CPU and large memory nodes, which can be configured to download more frequently queried data.
A similar "cold data" cluster can be created using some hardware that is less powerful, and a "cold data" cluster can contain only a few segment that are not frequently accessed
Availability: The history node relies on zookeeper to manage segment loading and unloading.

If zookeeper becomes unavailable, the history node can no longer service the new data and unload outdated data because it serves the query over HTTP.
Historical nodes can still respond to queries that query the data that it is currently serving. This means that zookeeper runs without affecting the availability of data that already exists in the history node.

6, Brokernode
Broker nodes play the role of query routing for historical nodes and real-time nodes.
The broker node knows about which segment are queryable and where these segment are stored in zookeeper, and the broker node can route incoming query requests to the correct history node or to a real-time node.
The broker node also merges the local results of the history node and the live node, and then returns the final merged result to the caller
Cache: The broker node contains a cache that supports the LRU invalidation policy. This cache can use either local heap memory or an external distributed key/value storage, such as memcached
Each time the broker node receives a query request, it maps the query to a set of segment. The result of this set of determined segment may already be present in the cache, without the need for recalculation.
For those results that do not exist in the cache, the broker node forwards the query to the correct history node and to the real-time node, and once the history node returns the results, the broker node caches the results for later use, as shown in the following figure.
Note: Real-time data is never cached, so query requests that query real-time node data are always forwarded to the live node. Real-time data is constantly changing, so caching real-time data is unreliable

Image above: The results are cached for each of the segment. The query merges the results of the cache with the historical nodes and the real-time nodes.
Caching can also be used as an additional level of data availability. In the event that all historical nodes fail, queries that have already cached the results in the cache can still be returned
Availability: Data can still be queried in the event that all zookeeper are interrupted. If the broker node is not able to communicate with zookeeper, it will continue to forward the query request to the historical node and to the real-time node using the view of the entire cluster it was last given, and the broker node assumes that the cluster's structure and zookeeper are consistent before the outage. In practice, when we diagnose zookeeper failures, this usability model allows the Druid cluster to continue to provide query services and to buy us more time
Description: Usually in the sharenothing architecture, if a node becomes unusable, there will be a service that moves the data from the node of the downline to the other nodes, but if the node is restarted immediately after the line, and if the service starts moving data at the same time, it will generate data transfer across the cluster. , is actually not necessary. Because the Distributed file system has multiple copies of the same data, the relocation data is actually intended to satisfy the number of replicas. The data on the nodes that are offline and restarted will not be lost, so short-term replicas will not affect the overall data health. It also takes time to move data across machines What's better than a given period of time if it's really dead, just start moving

7, Coordinatornode
Mainly responsible for the management of data and distribution on the historical nodes. The coordination node tells the history node to load new data, unload stale data, replicate data, and move data for load balancing.
Druid to maintain a stable view, a multi-version concurrency control Exchange protocol is used to manage immutable segment. If any immutable segment contains data that has been completely eliminated by the new segment, the expired segment will be unloaded from the cluster.
The coordination node undergoes a leader election process to determine that the coordination function is performed by a separate node, and the remaining coordination nodes as redundant backup nodes
The coordination node periodically executes to determine the current state of the cluster, making decisions by comparing the expected state of the cluster and the actual state of the cluster at run time. As with all Druid nodes, the coordination node maintains one and zookeeper connections to obtain information about the current cluster.
The Coordination node also maintains a connection to the MySQL database, and MySQL contains more operational parameters and configuration information.
One of the key information that exists in MySQL is a list of all segment that the historical node can provide services, which can be updated by any service that can create segment, such as a live node.
The MySQL database also contains a rule table to control how segment is created, destroyed, and replicated in the cluster.
Rules:rules Management History How segment is loaded and unloaded in the cluster.
Rules indicates how many copies of the segment should be kept in each tier by how segment should be assigned to different historical node tiers.
Rules may also indicate when segment should be completely unloaded from the cluster. Rules are typically set for a period of time, for example, a user might use rules to load valuable segment from the last one months into a cluster of "hotspot data," where valuable data from the last year is loaded into a "cold data" cluster, and the data before the earlier time is unloaded.
The coordination node loads a set of rules from the rule table in the MySQL database. Rules may be assigned to a specific data source, or a set of default rules can be configured. The coordination node loops through all available segment and matches the first rule that applies to it
Load balancing: In a typical production environment, queries typically hit dozens of or even hundreds of segment, and because the resources of each historical node are limited, segment must be distributed across the cluster to ensure that the load on the cluster is not too unbalanced.
To determine the optimal load distribution, you need to have a certain understanding of the query pattern and speed. Typically, a query overwrites a batch of segment of the nearest contiguous time in a separate data source. On average, queries for smaller segment are faster
These query patterns propose to replicate the historical segment at a higher rate, spread the large segment in a time-similar form to several different historical nodes, and bring together segment that exist in different data sources.
In order to achieve optimal distribution and equalization of segment in the cluster, a cost-based optimization program was developed based on segment data source, new and old degree, and size.
Copy/Copy (Replication):
The coordination node may tell different historical nodes to load a copy of the same segment. The number of replicas in each history node tier is fully configurable.
Set a high-level fault tolerant cluster to set a higher number of replicas. The copy of the segment is considered to be the same as the original segment and uses the same load balancing algorithm
By replicating segment, a single historical node failure is transparent to the entire Druid cluster, with no impact
Availability of:
The Coordination node has two additional dependencies of zookeeper and MySQL, and the coordination node relies on zookeeper to determine which historical nodes are in the cluster
If zookeeper becomes unavailable, the coordination node will no longer be able to send segment allocation, equalization, and unload instructions. However, none of this will affect the availability of data
The design principles for MySQL and zookeeper response failures are consistent: if a coordination node fails with an additional dependency response, the cluster will maintain the status quo
Druid uses MySQL to store operational management information and segment metadata about how segment exists in the cluster. If MySQL is offline, this information becomes unavailable in the coordination node, but this does not mean that the data is not available
If the coordination node is not able to communicate with MySQL, they will stop assigning new segment and uninstalling the expired segment. Broker nodes, historical nodes, real-time nodes are still available for querying during MySQL failure

8. Introduction to Architecture
Druid the architecture of the roles described above:

Query path: Red Arrow: ① The client initiates a request to the broker, which routes the request to the ② live node and ③ history node
Druid Data Flow: Black arrows: Data sources include live streaming and batch data. ④ real-time streams are indexed directly to real-time nodes, ⑤ bulk data is stored through Indexservice to Deepstorage,⑥ and then loaded by historical nodes. ⑦ real-time nodes can also dump data to Deepstorage

The Druid cluster relies on zookeeper to maintain the data topology. Each component will interact with zookeeper, as follows:

Real-time nodes dump segment to Deepstorage and write what they dump segment
Coordination node Management history node, which is responsible for obtaining the segment to be synchronized/downloaded from zookeeper, and assigning tasks to specific historical nodes to complete
The history node receives the task from the zookeeper and deletes the zookeeper entry after the task completes the task
The broker node routes the query request to the specified node based on the node where segment is located in zookeeper
For a query routing path, the broker distributes the request only to real-time nodes and historical nodes, so neither the metadata store nor the Deepstorage participates in the query (as a process in the background).
MetaData Storage and Zookeeper

The information stored in Metastore and zookeeper is not the same. Zookeeper is the node in which segment belongs. And Metastore is the metadata information that saves segment
In order for a segment to exist in the cluster, the metastore stored records are the self-descriptive metadata about segment: segment metadata, size, deepstorage
The metadata stored data is used by the coordinating node to know what data should be available in the cluster (segment can be transferred via real-time node or bulk data directly).

In addition to the node roles described above, Druid also relies on the external three components: ZooKeeper, Metadata Storage, Deep Storage, and the interactive diagram of the data and query flow as follows:

① Real-time data is written to the live node, creating the segment of the index structure.
The segment of the ② real-time node is transferred to Deepstorage after a period of time
③ metadata is written to MySQL; Real-time node-segment will add a new record to the zookeeper
④ Coordination nodes get metadata from MySQL, such as schema information (dimension columns and indicator columns)
⑤ Coordination Node monitoring ZK has new assignment/segment to delete, write zookeeper information: History node needs to load/delete segment
⑥ Historical node monitoring ZK, from the zookeeper to get the task to perform segment
⑦ History node download segment from Deepstorage and load into memory/or delete saved segment
The segment of the ⑧ history node can be used for the broker's query routing

9. Architecture Design
Since each node and other nodes are the least decoupled, the following two graphs represent the process of real-time node and bulk data, respectively:

The data is imported from Kafka to the real-time node, and the client directly queries the real-time node data.

Bulk data using Indexservice, receive the task of Post request, directly generate segment write to Deepstorage. The data in Deepstorage is only used by historical nodes.
So the services to be started here are: Indexservice (Overlord), historical, Coordinator (Coordination Node notification history node download segment),

Transfer to document Http://www.cnblogs.com/tgzhu/p/6077846.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More