Presto, a Java-based, distributed SQL query engine for large data from several G-to-P, has been launched by Facebook to interactively query large data from a few grams to a commercial data warehouse, which is said to have more than 10 times times the performance of Hive. Presto can query data storage products including Hive, Cassandra or even some commercial, and a single Presto query can consolidate data from multiple data sources for unified analysis. Presto's goal is to return query results within the desired response time, and Facebook uses Presto interactive queries in multiple internal data stores, including 300PB data warehouses, with more than 1000 Facebook employees running more than 30,000 checks per day using Presto Scan more than 1PB of data per day.
Directory:
- Presto architecture
- Presto Low Latency principle
- Presto Storage Plug-in
- Presto execution Process
- Presto engine comparison
Presto architecture
- The Presto query engine is a master-slave architecture that consists of the following three parts:
- A Coordinator node
- A Discovery server node
- Multiple worker nodes
- Coordinator: responsible for parsing SQL statements, generating execution plans, distributing execution tasks to worker nodes
- Discovery Server: usually embedded in the Coordinator node
- worker node: responsible for actually executing the query task, which is responsible for reading data interactively with HDFs
- After the worker node is started, it registers with the Discovery Server service , coordinator from Discovery server to get a working worker node. If Hive Connector is configured, you need to configure a hive Metastore service to provide hive meta information for Presto
- The more image architecture diagram is as follows:
Presto Low Latency principle
- Fully memory-based parallel computing
- pipelined Computing Jobs
- localization calculations
- Dynamically compiling execution plans
- GC Control
Presto Storage Plug-in
- Presto designed a simple abstraction layer of data storage that can be queried using SQL on top of different data storage systems.
- The storage plug-in (connector, connector) only needs to provide an interface to implement the following operations, including the extraction of metadata (metadata) , the location of the data store , the operation of the data itself , and so on.
- In addition to our main use of the Hive/hdfs backend system, we have also developed some Presto connectors that connect other systems, including Hbase,scribe and custom-developed systems
- The plugin structure diagram is as follows:
Presto execution Process
Presto engine comparison
- Compare results with Hive, Sparksql
Presto Architecture and principles