Hadoop (1): CentOS installation Hadoop & Hive

Last Update:2015-08-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1, about Hive

Hive is a Hadoop-based data Warehouse platform. With hive, we can easily work with ETL. Hive defines a SQL-like query language: HQL, which converts a user-written QL into a corresponding MapReduce program based on Hadoop execution.

Hive is a data Warehouse framework that Facebook has just open source for August 2008, and its system targets are similar to pig, but there are mechanisms that pig does not currently support, such as richer type systems, more SQL-like query languages, table/ Partition the persistence of metadata.

The text of this text connection is: http://blog.csdn.net/freewebsys/article/details/47617975 not allowed to reprint without the Bo master.

Home page:
http://hive.apache.org/

2, installation

First you install Hadoop
https://hadoop.apache.org/
Download tar.gz unzip directly. Latest Version 2.7.1.

tar -zxvf hadoop-2.7.1.tar.gzmv hadoop-2.7.1 hadoop

：
Http://hive.apache.org/downloads.html
It can be extracted directly. Latest Version 1.2.1.

-zxvf apache-hive-1.2.1-bin.tar.gz mv apache-hive-1.2.1 apache-hive

Set Environment variables:

export JAVA_HOME=/usr/java/defaultexport CLASS_PATH=$JAVA_HOME/libexport PATH=$JAVA_HOME/bin:$PATHexport HADOOP_HOME=/data/hadoopexport PATH=$HADOOP_HOME/bin:$PATHexport HIVE_HOME=/data/apache-hiveexport PATH=$HIVE_HOME/bin:$PATH

3, start hive, create table

Hive Official Website: https://cwiki.apache.org/confluence/display/Hive/Home
Configure the environment variable to start hive, which is a native environment that relies only on Hadoop and only has the HADOOP environment variable.

Create a data table, very similar to MySQL
Reference: http://www.uml.org.cn/yunjisuan/201409235.asp
Https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

# hiveLogging initializedusingConfigurationinchjar:file:/data/apache-hive/Lib/hive-common-1.2. 1.jar!/hive-log4j.propertieshive> Show Databases;okdefaultTime Taken:1.284Seconds, fetched:1Row (s) hive> usedefault; Oktime taken:0.064Secondshive> Show Tables;oktime taken:0.051Secondshive> CREATE TABLE user_info (uid int,nameSTRING) > Partitioned by(create_dateSTRING) > ROW FORMAT delimited fields TERMINATED by ', '> STORED asTextfile;oktime taken:0.09Seconds

You may encounter problems when you create a database table using Apache hive:

line5:2to‘date‘‘identifier‘in column specification

The description of the keyword conflicts. You can't use keywords such as date,user.

When you specify the storage format as Sequencefile, the data in TXT format is imported into the table, and hive reports the file format is wrong.

withfilethefilereturn1from org.apache.hadoop.hive.ql.exec.MoveTask

4, Import data

Hive does not support inserting a single line of INSERT statements, nor does IT support update operations. The data is loaded into the built-in table in load mode.
Once the data is imported, it cannot be modified. Because Hadoop is this feature.

Create two data files:

/data/user_info_data1.txt
121,zhangsan1
122,zhangsan2
123,zhangsan3
/data/user_info_data2.txt
124,zhangsan4
125,zhangsan5
126,zhangsan6

Data import: Import data into two partitions, respectively.

Hive>LOAD DATA LOCAL Inpath‘/ data/user_info_data1.txt ' OVERWRITE  into TABLE user_info PARTITION (create_date= ' 20150801 ');Loading  data to table Default.user_info partition (create_date=20150801) Partition  default. user_info{create_date=20150801} stats: [Numfiles=1, Numrows=0, totalsize=42, rawdatasize=0] OK TimeTaken:0.762SecondsHive>LOAD DATA LOCAL Inpath‘/ data/user_info_data2.txt ' OVERWRITE  into TABLE user_info PARTITION (create_date= ' 20150802 ');Loading  data to table Default.user_info partition (create_date=20150802) Partition  default. user_info{create_date=20150802} stats: [Numfiles=1, Numrows=0, totalsize=42, rawdatasize=0] OK TimeTaken:0.403Seconds

5, query

Direct query can be.

selectfromwhere20150801;OK121     zhangsan1       20150801122     zhangsan2       20150801123     zhangsan3       201508010.0993 row(s)

More query function references:
Hive function Encyclopedia and user-defined functions
Https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

6, summary

The text of this text connection is: http://blog.csdn.net/freewebsys/article/details/47617975 not allowed to reprint without the Bo master.

Hive can be very convenient for offline data statistics, because once the data entry can not be modified.
Hive's syntax is very similar to MySQL, and can be used to make full use of Hadoop for data statistics and join multiple times without worrying about efficiency issues.
Currently, there is a small problem that is not resolved, that is, the data import must use Textfile, not the compressed file type.
The specific description of this problem is referenced by:
http://blog.163.com/[email protected]/blog/static/6797953420128118227663/

Hadoop (1): CentOS installation Hadoop & Hive

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop (1): CentOS installation Hadoop & Hive

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support