Open source SQL in Hadoop solution: Where are we?

Source: Internet
Author: User
Keywords Solution open source some of us in

With Facebook opening up the recently released Presto, the already overcrowded SQL in Hadoop market has become more complex. Some open-source tools are trying to get the attention of developers: Hortonworks around the hive created Stinger, Apache Drill, Apache Tajo, Cloudera Impala, Salesforce's Phoenix (for HBase) and now Facebook Presto.

Organizations that have used Hadoop in a product environment need interactive SQL query support while being able to smoothly integrate with existing BI tools. Vijay Madhavan, from ebay, said in a SQL article in his blog Hadoop scenario:

Most map-reduce analysis systems now work well in non-interactive and batch SLA areas, including the current version of Hive, Pig, and cascading. Many products are working to support real-time interactive SLAs by providing interactive SQL in Hadoop solutions.

Use cases for SQL in Hadoop Solutions include support for interactive hoc queries, support for reporting/visualization using BI systems such as MicroStrategy or tableau, support for multiple source (multi-source) data, For example, behavioral data in HDFs must be connected to demographic data in an RDBMS or other source.

Many of these SQL in Hadoop solutions have something in common:

At the meta data level, it seems that hcatalog/hive metastore themselves into the standard of de facto (de-facto) management patterns across different data sources.

Then there are some data formats, such as Parquet and Orc, that are becoming more popular for the chosen workload and increasingly used in the natural environment.

Most of the solutions seem to support a wide variety of ANSI SQL (different versions: 1992, 1999, 2003).

Some of the above can help users migrate between different SQL in Hadoop solutions, without a lot of headaches. But there are some notable differences, as follows:

Part of the solution is supported by Apache, along with community support (Stinger, Drill, Tajo), and others by a separate entity organization (Impala, Phoenix, Presto).

In addition, some of the solutions have some limitations on the data source that can query the Hadoop ecosystem, while others are more flexible from a schema perspective and can query relational and NoSQL data stores (Presto, Drill).

Another point is to allow different operations to be performed on the data: some are pure (distributed) query engines, while others allow for update operations.

Over the last 10-18 months, more and more people and business entities have decided to try and implement low latency, hoc SQL access to data stored in Hadoop. In any case, there is room for a variety of SQL in Hadoop solutions to survive in the long run due to the differences between overlapping use cases and environmental preferences.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.