Why hive and Impala are used

Source: Internet
Author: User


I Mpala with the Hive are built on Hadoop data query Tools on top of each other, but each with a different focus, why should we use both tools at the same time? Is it okay to use Hive or Impala alone ?

First, Introduction Impala and the Hive

(1)Impalaand theHiveis to provideHdfs/hbasedata toSQLQuery Tools,Hivewill be converted intoMapReduce, with the help ofYARNscheduling to achieveHDFSaccess to the data, whileImpaladirectly toHDFSmake a data query. But they are all offering the following standardsSQLstatement, running in the fuselage.

650) this.width=650; "Src=" Http://s4.51cto.com/wyfs02/M02/8B/CA/wKiom1hY2GzDmQOiAAFKpFW-riE686.png-wh_500x0-wm_3 -wmp_4-s_2095612230.png "title=" 11.png "alt=" Wkiom1hy2gzdmqoiaafkpfw-rie686.png-wh_50 "/>

(2)Apache Hiveis aMapReducehigh-level abstraction, usinghiveql,hivecan be generated to run onHadoopof the clusterMapReduceorSparkjob. Hiveinitially byFacebookabout in -years of development,is nowApacheof open source projects.

Apache Impalais a high-performance dedicatedSQLengine, usingImpala SQL, becauseImpalaQueries against blocks of data are directly implemented without any framework, so the query is delayed in milliseconds. ImpalabeGoogleof theDremelProject Inspiration, -Year byClouderaDevelopment, is nowApacheOpen source projects.

Second, Impala and the Hive What's the difference?

( 1 ) Hive There are a number of features:

1 , for complex data types (such as Arrays and the Maps ) and more extensive support for window analysis

2 , High scalability

3 , typically used for batch processing

( 2 ) Impala Faster

1 , professional SQL engine that provides the 5x to the 50x Better Performance

2 , ideal for interactive queries and data analysis tools

3 and more features are being added in

Three, advanced Overview:

650) this.width=650; "Src=" Http://s5.51cto.com/wyfs02/M01/8B/C7/wKioL1hY2IvAmLx5AAKFuOKMRfM020.png-wh_500x0-wm_3 -wmp_4-s_2436995546.png "title=" 22.png "alt=" Wkiol1hy2ivamlx5aakfuokmrfm020.png-wh_50 "/>

Iv. Why to use Hive and the Impala?

1 No software development experience required Span style= "FONT-SIZE:19PX;" >, Use the already mastered sql Span style= "font-size:19px;font-family: ' The song Body '; > knowledge for data analysis.

2 , than write directly mapreduce or spark with better productivity, 5 Span style= "font-size:19px;font-family: ' The song Body '; > line hiveql/impala SQL Equivalent to 200 line or more Span style= "FONT-SIZE:19PX;" >java code.

3 And provides good interoperability with other systems, such as through Java and external scripting extensions, and many business intelligence tools support Hive and the Impala .

Five, Hive and the Impala use case

(1) Log File Analysis

Log is a universal data type, is an important data source in the era of big data, the structure is not fixed, can be Flume and the Kafka put the log capture in HDFS , then analyze the structure of the log, build a table based on the log delimiter, and then use Hive and the Impala analysis of the data. For example:

650) this.width=650; "Src=" Http://s5.51cto.com/wyfs02/M02/8B/CA/wKiom1hY2Jywsm30AABjTOKy9kY690.png-wh_500x0-wm_3 -wmp_4-s_1734100630.png "title=" 33.png "alt=" Wkiom1hy2jywsm30aabjtoky9ky690.png-wh_50 "/>

(2) sentiment Analysis

Many organizations use Hive or Impala to analyze social media coverage. For example:

650) this.width=650; "Src=" Http://s4.51cto.com/wyfs02/M02/8B/C7/wKioL1hY2KrSyvpDAAELu0CeQj0997.png-wh_500x0-wm_3 -wmp_4-s_2718704792.png "title=" 44.png "alt=" Wkiol1hy2krsyvpdaaelu0ceqj0997.png-wh_50 "/>

(3) Business Intelligence

Many of the leading BI Tool Support Hive and the Impala

650) this.width=650; "Src=" Http://s3.51cto.com/wyfs02/M00/8B/CA/wKiom1hY2LfStuHLAAEP51bem-8754.png-wh_500x0-wm_3 -wmp_4-s_218677986.png "title=" 55.png "alt=" Wkiom1hy2lfstuhlaaep51bem-8754.png-wh_50 "/>

See the Hive and the Impala the role and efficacy of the description, for mastering Hadoop data processing plays an important role. Everyone in the usual to accumulate and experience, and constantly improve the level of skills. I usually in addition to summing up their own experience and lessons, but also like to see others to share the knowledge, learn from each other, for the improvement of their knowledge structure has an important role. Like "CSDN" forum, "big data cn", "Big Data Times Learning Center" service number are very good, in short, efforts to learn from a variety of knowledge, we will make greater progress!


This article is from the "11872756" blog, please be sure to keep this source http://11882756.blog.51cto.com/11872756/1884300

Why hive and Impala are used

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.