Hive installation overview

Source: Internet
Author: User
Hive is a Hadoop-based data warehouse platform. Hive provides SQL-like query languages. Hive data is stored in HDFS. Generally, user-submitted queries are converted into MapReduce jobs by Hive and submitted to Hadoop for running. We started from Hive installation and gradually learned all aspects of Hive. Prerequisites for installing Hive lJava6lHadoop

Hive is a Hadoop-based data warehouse platform. Hive provides SQL-like query languages. Hive data is stored in HDFS. Generally, user-submitted queries are converted into MapReduce jobs by Hive and submitted to Hadoop for running. We started from Hive installation and gradually learned all aspects of Hive. Prerequisites for installing Hive l Java 6 l Hadoop

Hive is a Hadoop-based data warehouse platform.

Hive provides SQL-like query languages. Hive data is stored in HDFS. Generally, user-submitted queries are converted into MapReduce jobs by Hive and submitted to Hadoop for running.

We started from Hive installation and gradually learned all aspects of Hive.

Prerequisites for installing Hive

L Java 6

L Hadoop

For details about the version, refer to the official Hive documentation. To install Have, you do not need to set Hadoop information. You only need to ensure that the HADOOP_HOME environment variable is correctly set.

Install

We chose to download the 0.11.1 stable version. :

Http://mirrors.hust.edu.cn/apache/hive/stable/

1) decompress the installation package to the specified directory:

Tar xzf hive-0.11.0.tar.gz

2) Set Environment Variables

Export HIVE_INSTALL =/opt/Hive-0.11.0

Export PATH = $ PATH: $ HIVE_INSTALL/bin

3) enter the following command to enter Shell

Hive

Hive interactive environment (Shell)

Shell is the main tool for interacting with Hive.

Hive is called HiveQL. The HiveQL design is greatly affected by MySQL, so if you are familiar with MySQL, you will find HiveQL as convenient.

After entering Shell, enter the following command to check whether Hive works properly:

Show tables;

The output result is:

OK

Time taken: 8.207 seconds

If the output result shows an error, it may be that Hadoop is not running, or the HADOOP_HOME variable is not correctly set.

Like SQL, HiveQL is generally case-insensitive (except for string comparison ).

Press the Tab key to enter the command. Hive will prompt all available inputs. (Command automatically completed)

It may take several seconds or even longer to use this command for the first time, because Hive will create a metastore database (stored in the metastore_db Directory, which is under the directory where you run hive, so when you run Hive for the first time, first enter the appropriate directory ).

You can also run the hive script directly from the command line, for example:

Hive-f/home/user/hive. q

Here,-f is followed by the script file name (including the path ).

In interactive mode or non-interactive mode, hive usually outputs some auxiliary information, such as the command execution time. If you do not need to output these messages, you can add the-s option when entering hive, for example:

Hive-S

Note: S is capitalized.

Simple Example

We use the following data as the test data, and the structure is (class number, student number, and score ).

C01, N0101, 82

C01, N0102, 59

C01, N0103, 65

C02, N0201, 81

C02, N0202, 82

C02, N0203, 79

C03, N0301, 56

C03, N0302, 92

C03, N0306, 72

Run the following command:

Create table student (classNostring, stuNo string, score int) row format delimited fields terminated ',';

The definition table structure is similar to that of SQL .. Other settings indicate that fields are separated by commas, and one row is a record.

Load data local inpath '/home/user/input/student.txt' overwrite into table student;

The output result is as follows:

Copying data fromfile:/home/user/input/student.txt

Copying file:/home/user/input/student.txt

Loading data to tabledefault. student

Rmr: DEPRECATED: Please use 'rm-R' instead.

Deleted/user/hive/warehouse/student

Table default. student stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 117, raw_data_size: 0]

This command loads the content of the student.txt file to the student table. This operation will directly copy the student.txt file to the warehouse directory of hive, which is set by the configuration item hive. metastore. warehouse. dir. The default value is/user/hive/warehouse. The Overwrite option causes Hive to delete all files in the student directory in advance.

Hivewill not process student.txt in any format, because Hive does not emphasize the data storage format.

In this example, Hive stores data in the HDFS system. Hive can also store data locally.

If the overwrite option is not added and the loaded file already exists in Hive, Hive will rename the file. ". (This is inconsistent with what is described in the Hadoop authoritative tutorial. Please verify it with caution)

Next, run the following command:

Select * from student;

The output is as follows:

C01 N0101 82

C01 N0102 59

C01 N0103 65

C02 N0201 81

C02 N0202 82

C02 N0203 79

C03 N0301 56

C03 N0302 92

C03 N0306 72

Run the following command:

Select classNo, count (score) fromstudent where score> = 60 group by classNo;

The output is as follows:

C01 2

C02 3

C03 2

As you can see, HiveQL is similar to SQL. We use group and count. In fact, in the background Hive converts these operations into MapReduce operations and submits them to Hadoop for execution, and finally outputs the results.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.