Introduction and installation of Hive learning Note _hive

Source: Internet
Author: User

First, what is hive

Hive is a data warehouse infrastructure built on Hadoop. It provides a range of tools that can be used for data extraction conversion loading (ETL), a mechanism that can store, query, and analyze large-scale data stored in Hadoop. Hive defines a simple class SQL query language called HQL, which allows users who are familiar with SQL to query data. At the same time, the language also allows developers to familiarize themselves with the development of custom Mapper and reducer for the built-in mapper and reducer of complex analytical work that cannot be done.

Ii. The architecture of Hive

A hive architecture provided for the official website.

From an architectural perspective, hive is the Data Warehouse infrastructure built on Hadoop.

1, the user interface of hive is: Cli,hiveserver,webui.

①CLI is a command-line client or a command-line environment, the client can operate directly in command-line mode.

②hiveserver supports JDBC/ODBC mode, Hive provides thrift service, thrift client currently supports C++/java/php/python/ruby.

The ③webgui interface allows hive to provide a more intuitive web operations page. However, when processing large amounts of data, it is not recommended.

2. Metastore metadata Store, storing the structured information of all hive tables and partitions, including column and column type information, serializer and deserializer, to read and write data in HDFs.

There are three ways to store it.

① built-in Derby style

②local Way

③remote Way

About three kinds of storage methods will be introduced in detail in later blog post.

3. The relationship between Hadoop and hive

Hive is a component of Hadoop, and as a data Factory repository, hive's data is stored in a Hadoop file system, and hive provides SQL statements for Hadoop, and Hadoop can manipulate data in the file system through SQL statements. Hive is dependent on Hadoop.

Download a picture on the internet, it is very clear about their relationship, such as:

Third, the installation of hive

1, open the Services Wizard, select Install Hive, and install the MapReduce before you install hive. Such as

  

2, first we will see that we will select a set of dependencies for hive.

  

3, custom assign roles, according to the actual situation, to assign roles.

  

4, select the database, you can choose the embedded database, later to change.

  

Test the connection and, if successful, click Continue.

5, installation progress.

  

Introduction and installation of Hive learning Note _hive

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.