Hadoop (1): CentOS installation Hadoop & Hive

Source: Internet
Author: User

1, about Hive

Hive is a Hadoop-based data Warehouse platform. With hive, we can easily work with ETL. Hive defines a SQL-like query language: HQL, which converts a user-written QL into a corresponding MapReduce program based on Hadoop execution.

Hive is a data Warehouse framework that Facebook has just open source for August 2008, and its system targets are similar to pig, but there are mechanisms that pig does not currently support, such as richer type systems, more SQL-like query languages, table/ Partition the persistence of metadata.

The text of this text connection is: http://blog.csdn.net/freewebsys/article/details/47617975 not allowed to reprint without the Bo master.

Home page:
http://hive.apache.org/

2, installation

First you install Hadoop
https://hadoop.apache.org/
Download tar.gz unzip directly. Latest Version 2.7.1.

tar -zxvf hadoop-2.7.1.tar.gzmv hadoop-2.7.1 hadoop


Http://hive.apache.org/downloads.html
It can be extracted directly. Latest Version 1.2.1.

-zxvf apache-hive-1.2.1-bin.tar.gz mv apache-hive-1.2.1 apache-hive

Set Environment variables:

export JAVA_HOME=/usr/java/defaultexport CLASS_PATH=$JAVA_HOME/libexport PATH=$JAVA_HOME/bin:$PATHexport HADOOP_HOME=/data/hadoopexport PATH=$HADOOP_HOME/bin:$PATHexport HIVE_HOME=/data/apache-hiveexport PATH=$HIVE_HOME/bin:$PATH
3, start hive, create table

Hive Official Website: https://cwiki.apache.org/confluence/display/Hive/Home
Configure the environment variable to start hive, which is a native environment that relies only on Hadoop and only has the HADOOP environment variable.

Create a data table, very similar to MySQL
Reference: http://www.uml.org.cn/yunjisuan/201409235.asp
Https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

# hiveLogging initializedusingConfigurationinchjar:file:/data/apache-hive/Lib/hive-common-1.2. 1.jar!/hive-log4j.propertieshive> Show Databases;okdefaultTime Taken:1.284Seconds, fetched:1Row (s) hive> usedefault; Oktime taken:0.064Secondshive> Show Tables;oktime taken:0.051Secondshive> CREATE TABLE user_info (uid int,nameSTRING) > Partitioned by(create_dateSTRING) > ROW FORMAT delimited fields TERMINATED by ', '> STORED asTextfile;oktime taken:0.09Seconds

You may encounter problems when you create a database table using Apache hive:

line5:2to‘date‘‘identifier‘in column specification

The description of the keyword conflicts. You can't use keywords such as date,user.

When you specify the storage format as Sequencefile, the data in TXT format is imported into the table, and hive reports the file format is wrong.

withfilethefilereturn1from org.apache.hadoop.hive.ql.exec.MoveTask
4, Import data

Hive does not support inserting a single line of INSERT statements, nor does IT support update operations. The data is loaded into the built-in table in load mode.
Once the data is imported, it cannot be modified. Because Hadoop is this feature.

Create two data files:

/data/user_info_data1.txt
121,zhangsan1
122,zhangsan2
123,zhangsan3
/data/user_info_data2.txt
124,zhangsan4
125,zhangsan5
126,zhangsan6

Data import: Import data into two partitions, respectively.

Hive>LOAD DATA LOCAL Inpath‘/ data/user_info_data1.txt ' OVERWRITE  into TABLE user_info PARTITION (create_date= ' 20150801 ');Loading  data to table Default.user_info partition (create_date=20150801) Partition  default. user_info{create_date=20150801} stats: [Numfiles=1, Numrows=0, totalsize=42, rawdatasize=0] OK TimeTaken:0.762SecondsHive>LOAD DATA LOCAL Inpath‘/ data/user_info_data2.txt ' OVERWRITE  into TABLE user_info PARTITION (create_date= ' 20150802 ');Loading  data to table Default.user_info partition (create_date=20150802) Partition  default. user_info{create_date=20150802} stats: [Numfiles=1, Numrows=0, totalsize=42, rawdatasize=0] OK TimeTaken:0.403Seconds
5, query

Direct query can be.

selectfromwhere20150801;OK121     zhangsan1       20150801122     zhangsan2       20150801123     zhangsan3       201508010.0993 row(s)

More query function references:
Hive function Encyclopedia and user-defined functions
Https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

6, summary

The text of this text connection is: http://blog.csdn.net/freewebsys/article/details/47617975 not allowed to reprint without the Bo master.

Hive can be very convenient for offline data statistics, because once the data entry can not be modified.
Hive's syntax is very similar to MySQL, and can be used to make full use of Hadoop for data statistics and join multiple times without worrying about efficiency issues.
Currently, there is a small problem that is not resolved, that is, the data import must use Textfile, not the compressed file type.
The specific description of this problem is referenced by:
http://blog.163.com/[email protected]/blog/static/6797953420128118227663/

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Hadoop (1): CentOS installation Hadoop & Hive

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.