Presto: Distributed SQL query engine that can handle petabytes of data

Last Update:2016-06-22 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In the fall of 2012, Facebook launched the Presto, Presto, whichaims to perform quasi-real-time analysis on hundreds of petabytes of data. After abandoning some external projects, Facebook is ready to develop its own distributed query engine. Presto's syntax is based on ANSI SQL, and most distributed query engines require users to learn a new syntax, some of which are similar to SQL, but none are as familiar as real SQL and have detailed documentation. Facebook hopes this decision will make it easier and faster to train new users. Relying on ANSI SQL also allows Presto to take advantage of existing third-party tools.

Internally, the Presto is based on pipelining. When the request is parsed and the task is assigned to the appropriate node, the client pulls the data from the output phase and the output phase pulls the data from the lower stage. Presto's execution pattern is a fundamental difference from hive/mapreduce. Hive translates the query statements into different stages of the MapReduce task, and then executes one after the other. Each task reads the input from the disk and writes the intermediate results back to disk. By contrast, Presto is not using MapReduce, he uses the query and execution engines that are used by everyone, and they have well-designed operators that support SQL syntax. More than optimized scheduling, the whole process is in memory, but also in the different stages of network interaction through the pipeline operation. This avoids unnecessary IO operations, and the resulting high latency. This pipelined execution model can run at different stages at the same time, and when the data is available, stream data is from one stage to another. For many types of queries, this significantly reduces end-to-end latency.

Presto is a pluggable back end written in Java. For many data sources, such as Hive, HBase, or scribe, a data connector is required. This connector provides metadata for Presto, information about which nodes hold data, and provides a way to stream data.

In most of Facebook's query scenarios, Presto is more than hive/mapreduce 10 times times more than the time-consuming and CPU-intensive. Facebook still plans to further improve performance. A plan is to design a new data format to reduce the amount of data that is required to convert data from one phase to another. Facebook also plans to remove some of the current design limitations: The main limitation is the size of the table at the time of the join operation and the cardinality of the unique primary key and group time. At present, the system lacks the ability to export Data association to the table, the current query results are returned to the client.

At present, the United States Regiment has a large-scale use, see: http://tech.meituan.com/presto.html

Currently Presto has been incorporated into apache2.0, with its git address: https://github.com/prestodb/presto

Official Document: Https://prestodb.io/docs/current/overview/use-cases.html

Presto: Distributed SQL query engine that can handle petabytes of data

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Presto: Distributed SQL query engine that can handle petabytes of data

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Presto: Distributed SQL query engine that can handle petabytes of data

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support