Presto Architecture and principles

Last Update:2016-11-08 Source: Internet

Author: User

Tags dashed line

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Presto, a Java-based, distributed SQL query engine for large data from several G-to-P, has been launched by Facebook to interactively query large data from a few grams to a commercial data warehouse, which is said to have more than 10 times times the performance of Hive. Presto can query data storage products including Hive, Cassandra or even some commercial, and a single Presto query can consolidate data from multiple data sources for unified analysis. Presto's goal is to return query results within the desired response time, and Facebook uses Presto interactive queries in multiple internal data stores, including 300PB data warehouses, with more than 1000 Facebook employees running more than 30,000 checks per day using Presto Scan more than 1PB of data per day.

Directory:

Presto architecture
Presto Low Latency principle
Presto Storage Plug-in
Presto execution Process
Presto engine comparison

Presto architecture

The Presto query engine is a master-slave architecture that consists of the following three parts:

A Coordinator node
A Discovery server node
Multiple worker nodes

Coordinator: responsible for parsing SQL statements, generating execution plans, distributing execution tasks to worker nodes
Discovery Server: usually embedded in the Coordinator node
worker node: responsible for actually executing the query task, which is responsible for reading data interactively with HDFs
After the worker node is started, it registers with the Discovery Server service , coordinator from Discovery server to get a working worker node. If Hive Connector is configured, you need to configure a hive Metastore service to provide hive meta information for Presto
The more image architecture diagram is as follows:

Presto Low Latency principle

Fully memory-based parallel computing
pipelined Computing Jobs
localization calculations
Dynamically compiling execution plans
GC Control

Presto Storage Plug-in

Presto designed a simple abstraction layer of data storage that can be queried using SQL on top of different data storage systems.
The storage plug-in (connector, connector) only needs to provide an interface to implement the following operations, including the extraction of metadata (metadata) , the location of the data store , the operation of the data itself , and so on.
In addition to our main use of the Hive/hdfs backend system, we have also developed some Presto connectors that connect other systems, including Hbase,scribe and custom-developed systems
The plugin structure diagram is as follows:

Presto execution Process

Execution process:
commits the query: After the user submits a query statement using the Presto CLI, the CLI communicates with coordinator using the HTTP protocol. Coordinator receives the query request and calls the Sqlparser parse SQL statement to get the statement object, and wraps the statement into a Querystarter object into the thread pool to wait for execution, such as: Example SQL as follows

Select Count (*fromjoinon=where>tenGroup  byten;

The logical execution process is as follows:
The dashed line in the logical execution plan diagram is the Presto of the logical execution plan, the subplan of the logical plan plan generation is divided into four parts, each subplan commits to one or more worker nodes
Subplan has several important properties for the plandistribution, Outputpartitioning, and Partitionby properties of the entire execution process flow chart as follows:
1. Plandistribution: Represents the distribution method for a query phase, with 4 subplan in 3 different plandistribution ways
  - Source: Indicates that this subplan is a data source, and that a task of source type determines how many nodes are allocated for execution according to the data source size
  - Fixed : indicates that the subplan will be assigned the number of nodes to execute (query.initial-hash-partitions parameter configuration in config configuration, default is 8)
  - None: indicates that the subplan is only assigned to one node for execution
2. Outputpartitioning: Indicates whether the output of this subplan is shuffle to the data according to the Partitionby key value(Shuffle), only two values hash and none
In the execution plan, SubPlan1 and SubPlan0 Plandistribution=source, these two subplan are the nodes that provide the data source, and all the read data for all nodes is sent to every node of SubPlan1. The SubPlan2 allocates 8 nodes to perform the final aggregation operation; The SUBPLAN3 is only responsible for outputting the final computed data, such as:
SubPlan1 and SubPlan0 as the source node, they read the HDFs file data in the same way that the HDFs inputsplit API is called, and then each inputsplit assigns a worker node to execute, The maximum number of Inputsplit allocated per worker node is parameter configurable, config query.max-pending-splits-per-node parameter configuration, default is 100
Each node of the SubPlan1 reads a split data and filters the data out to each SUBPLAN0 node for join operations and partial AGGR operations
SubPlan0 data is distributed to different SubPlan2 nodes by the hash value of GroupBy key after each node calculation is completed
Distribute data to the SUBPLAN3 node after all SubPlan2 node calculations are complete
SUBPLAN3 node is finished, notify coordinator to end the query and send the data to Coordinator

Presto engine comparison

Compare results with Hive, Sparksql

Presto Architecture and principles

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More