Pinot architecture introduction, pinot Architecture
1. High Level Architecture
1. Purpose: To provide analysis services for a given dataset
2. Input data: Hadoop & Kafka
3. Indexing Technology: to provide fast queries, Pinot uses column-based storage and various indexing technologies (bitmap, inverted index) 2. Data Flow2.1 Hadoop (Historical)
1. Input data: AVRO, CSV, JSON, etc;
2. Processing Process: files on HDFS are converted into indexed segments through MR tasks, and then pushed to the historical nodes of the Pinot cluster to provide Query capabilities;
3. Data invalidation: You can configure the retention date for an indexed Segment, which is automatically deleted after the preset expiration date; 2.2 Realtime
1. Input data: Kafka stream
2. Processing Process: A Real-time Data Node consumes Kafka data to generate an indexed Segment in the memory and periodically flush it to the disk. Then, it provides the query function;
3. Data invalidation: The retention date of the data on the Real-time node is relatively short. For example, if the data is retained for three days, the data on the Real-time node is stored in the historical node before the validity period;
2.3 Query routing: select count (*) from table where time> T is converted to the following two-hop Query statement:
1. historical node: select count (*) from table where time> T and time <T1
2. realtime node: select count (*) from table where time> T1
Note: 1. All user queries will be sent to Pinot Broker;
2. You do not need to worry about whether the query is sent to a real-time or historical node;
3. Pinot Broker Automatically splits requests based on Query conditions and sends them to real-time and historical nodes as needed;
4. The results can be automatically merged;
3. Pinot Components Architecture
Note: 1. The entire system uses Apache Helix for cluster management;
2. Use Zookeeper to store the cluster status and save the Helix and Pinot configurations;
3. Pinot uses NFS to push segments generated by MR on HDFS to PinotServer.
3.1 Historical Node
3.1.1 Data Preparation1. an indexed Segment is created in Hadoop.
2. The Pinot team provides the Library for generating Segment
3. The data format can be AVRO, CSV, or JSON.
3.1.2 Segment creation on Hadoop1. data in HDFS is divided into 256/512 MB shards.
2. Each mapper generates a new Segment with an index by sharding.
3.1.3 Segment move from HDFS to NFS1. read data from HDFS and send it to the Pinot Controller node through httppost
2. PinotController stores the Segment in NFS mounted on the Pinot Controller node.
3. Then, the Pinot Controller assigns the Segment to a Pinot Server.
4. The assigned information is maintained and managed by Helix.
3.1.4 Segment move from NFS to Historical Node1. Helix will monitor the survival status of the Pinot Server
2. When a Server is started, Helix notifies the Pinot Server of the Segment assigned to the Server.
3. The Pinot Server downloads the Segment from the Controller Server and loads it to a local disk.
3.1.5 Segment Loading1. the decompressed Segment contains metadata and the forward and inverted indexes of each column.
2. Then, according to the load mode (memory, mmap), load to the memory or be mmap to the server
3. After loading, Helix notifies the Broker node that the Segment can be used on the server. The Broker routes the query to the server during query.
3.1.6 Segment Expiry1. the Pinot Control Service has a background cleanup thread to delete expired Segment Based on metadata.
2. Deleting a Segment will clear NFS data in the Controller service and metadata on the Helix service.
3. Helix will notify the Pinot Server to switch the Segment online and offline, change the Segment to the offline status, and then to the deleted status, and delete data from the local disk.
Note: 1. hadoop jar pinot-hadoop-0.016.jar SegmentCreation job. properties
2. hadoop jar pinot-hadoop-0.016.jar SegmentTarPush job. properties
3. The Segment loading process is an offline online 3.2 Real time Node switchover triggered by Helix.
3.2.1 Kafka consumption1. Pinot creates a resource, and Pinot allocates a group of instances to consume data from the Kafka topic.
2. If the Pinot Server fails, the consumption will be distributed to another node.
3.2.2 Segment creation1. when the Pinot Server consumes a pre-configured number of events, the data will be converted into offline Segment in the memory.
2. After the Segment is created successfully, Pinot submits the offset to Kafka. If the Segment fails, Pinot will regenerate the Segment from the last checkpoint.
3.2.3 Segment Expiry1. it can only be configured to days. After expiration, Segment is distributed from real-time nodes to historical nodes.
Note: The Segment format generated by the real-time node is the same as that generated by the historical node, which facilitates the redistribution of the Segment from the real-time node to the historical node.
3.3 Pinot Cluster Management
3.3.1 Overview 1. All management commands must use Pinot Controller, such as allocating Pinot Server and Brokers, Creating New Table, and Uploading New Segments.
2. All Pinot admin cmd needs to be translated into Helix cmd via Helix Admin Api internally, and then Helix cmd modifies the metadata in Zookeeper.
3. As the brain of the system, Helix Controller escapes metadata changes into an action set and executes corresponding actions on the corresponding participant ant.
4. helix Controller is also responsible for monitoring the Pinot Server. When the Pinot Server is started or fails, Helix Controller discovers and modifies the corresponding external view. Pinot Broker observes these changes, and dynamically change the route of the table.
3.3.2 terms 1. Pinot Segment: Helix Partition. Each Segment has multiple-point copies.
2. Pinot Table: consists of multiple segments. The Segment of the same Table has the same Schema.
3. Pinot Server: corresponds to Helix Participant ipant, which is mainly used to save the Segment
4. Pinot Broker: corresponding to Helix Spectator, used to observe the status changes of Segment and Pinot Server. To support multiple tenants, Pinot Broker is also used as Helix Participant ipant.
3.3.3 Zookeeper is mainly used to store the cluster status and also to store some configurations of Helix and Pinot.
3.3.4 the Broker Node is mainly responsible for routing the query request from the client to the Pinot Server instance, collecting the results returned by the Pinot Server, merging the results into the final results, and returning them to the client. Features:
1. service discovery: Server, Table, Segment, and Time range are detected, and query execution routes are calculated. 2. scatter gather: distributes queries to the corresponding Server, merges the results returned by each Server, and returns the results to the Pinot Broker client to implement multiple Pinot Server selection policies: segment uniform distribution policy, greedy algorithm Policy (maximum/small Pinot Server involved), and randomly selected Server of segment. If the execution of the Pinot Server fails or times out, the Pinot Broker can only return some results. In the future, support other methods, such as re-try or send the same execution plan to multiple segment copies.
4. Pinot Index Segment
4.1 differences between Row-based storage and column-based storage 4.1.1 features of Row-based storage 1. OLTP2. a whole row of data is stored together 3. Ease of INSERT/UPDATE 4.1.2 column-based storage features 1. OLAP
2. Only the columns involved in the query will be read.
3. Any column can be used as an index: fixed length index, sparse index, etc.
4. Easy Compression
5. bitmap can improve query execution performance. 4.2 Anatomy of Index Segment4.2.1 Segment Entities1. Segment Metadata: mainly defines the Metadata information of Segment, including:
Segment. name
Segment. table. name
Segment. dimension. column. names
Segment. metric. column. names
Segment. time. column. name
Segment. time. interval
Segment. start. time/segment. end. time
Segment. time. unit
......
2. Column Metadata, including: column .. cardinality
Column .. totalDocs
Column .. dataType: INT/FLOAT/STRING
Column... lengthOfEachEntry
Column .. columnType: Dimension/Metric/Time
Column .. isSorted
Column .. hasDictionary
Column .. isSingleValues
......
3. Creation Metadata (creation. meta), including: Dictionary (. dict): column encoding Dictionary
Forward Index (. sv. sorted. fwd): Single Value Sorted Forward Index, prefix compression Index
5. Query Processing
5.1 Query Execution Phases5.1.1 Query Parsing converts PQL into a query parse tree using anlr as a syntax parser
5.1.2 Logical Plan Phase converts the query parse tree into a Logical Plan Tree by querying metadata
5.1.3 Physical Plan Phase further optimization and specific execution Plan based on Segment information
5.1.4 The Executor Service executes the physical operator tree on the corresponding Segment.
5.2 PQLPQL is a subset of SQL statements. join and subquery are not supported.
Ref: https://github.com/linkedin/pinot/wiki/Architecture
Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.