Installation and configuration of Hadoop:hive 1.2.0 on Mac Machine

Last Update:2015-06-07 Source: Internet

Author: User

Tags log4j

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Environment: Mac OS X Yosemite + Hadoop 2.6.0 + hive 1.2.0 + jdk 1.7.0_79

Prerequisite: Hadoop must be installed and running (either pseudo-fractional or full-distributed mode)

Hive Website Address: http://hive.apache.org/

Recommendation: By personal practice, in the Mac OS X Yosemite Environment, if you use the original Hadoop 2.6.0 downloaded by Apache, regardless of what version of the JDK is installed (1.6\1.7\1.8 have tried), Hive 1.2.0 starts, always reported that the JDK version does not match, Later, when the Hadoop 2.6.0 source was compiled into a native version of Mac on Mac, it was normal.

If a friend encounters a similar situation, refer to compiling Hadoop 2.6.0/2.7.0 and Tez 0.5.2/0.7.0 considerations on Mac OS X Yosemite

I. Environment variables

... export Hadoop_home=/home/hadoop/hadoop-2.6.0export Hive_home=/home/hadoop/hive-1.2.0...export HADOOP_CONF_DIR= ${hadoop_home}/etc/hadoop...export path=${hive_home}/bin: $PATH: $HOME/bin:

Third, modify the XML configuration in hive

CP Hive-default.xml.template Hive-default.xml

CP Hive-default.xml.template Hive-site.xml

CP Hive-exec-log4j.properties.template Hive-exec-log4j.properties

CP Hive-log4j.properties.template Hive-log4j.properties

CP Beeline-log4j.properties.template Beeline-log4j.properties

That is: A few template files with a. Template suffix, copying a copy into a configuration file without. Template, note hive-default.xml.template this to copy two copies, one is Hive-default.xml, the other is Hive-site.xml, which Hive-site.xml is a user-defined configuration, Hive-default.xml is a global configuration, and when Hive starts,-site.xml custom configuration overrides the same configuration item-default.xml the global configuration.

1 <?XML version= "1.0" encoding= "UTF-8" standalone= "no"?>2 3 <Configuration>4 5     < Property>6         <name>Hive.metastore.local</name>7         <value>True</value>8     </ Property>9     Ten    <!--  One <property> A <name>javax.jdo.option.ConnectionURL</name> - <value>jdbc:postgresql://localhost:5432/hive</value> - </property> the      - <property> - <name>javax.jdo.option.ConnectionDriverName</name> - <value>org.postgresql.Driver</value> + </property> -      - +     A     at     < Property> -         <name>Javax.jdo.option.ConnectionURL</name> -         <value>Jdbc:mysql://127.0.0.1:3306/hive?characterencoding=utf-8</value> -     </ Property> -      -     < Property> in         <name>Javax.jdo.option.ConnectionDriverName</name> -         <value>Com.mysql.jdbc.Driver</value> to     </ Property> +      -     < Property> the         <name>Javax.jdo.option.ConnectionUserName</name> *         <value>Hive</value> $     </ Property>Panax Notoginseng      -     < Property> the         <name>Javax.jdo.option.ConnectionPassword</name> +         <value>Hive</value> A     </ Property> the      +     -     < Property> $         <name>Hive.exec.scratchdir</name> $         <value>/tmp/hive</value> -     </ Property> -      the     < Property> -         <name>Hive.exec.local.scratchdir</name>Wuyi         <value>/users/jimmy/app/hive-1.2.0/tmp</value> the     </ Property> -  Wu     < Property> -         <name>Hive.downloaded.resources.dir</name> About         <value>/users/jimmy/app/hive-1.2.0/tmp/${hive.session.id}_resources</value> $     </ Property> -  -     < Property> -         <name>Hive.metastore.warehouse.dir</name> A         <value>/user/hive/warehouse</value>     +     </ Property> the  - </Configuration>

Note: There is a concept of metadata in hive, metadata records which tables are currently in, which fields, field data types, and so on, because HDFs does not have this additional information, so hive needs to use the traditional database to record these metadata information, by default, the built-in database Derby to record, It can also be configured to record these metadata into large rdmbs such as Mssql\mysql\oracle\postgresql, the above configuration shows MySQL, PostgreSQL two configuration, if the 23-41 commented out, it becomes the Derby independent mode.

Another: In the above configuration file, there are some parameters about the directory, first put the directory in advance,

Hive.exec.local.scratchdir
Hive.downloaded.resources.dir

These two items correspond to the local directory (must be manually built first), the other directory is in the HDFs directory (hive startup, first automatically built, if the automatic creation fails, you can also manually through the shell in HDFs created)

Iv. replacing the JLine jar package in Hadoop 2.6.0

Since the JLine package that comes with HIVE 1.2.0 is inconsistent with the version of HADOOP 2.6.0, you need toreplace the $hive_home/lib/jline-2.12.jar this file $hadoop_home/ Share/hadoop/yarn/lib the original version (that is, delete the old version, copy the new version to this directory), or hive boot will fail

V. Testing and Verification

$HIVE _home/bin/hive

If normal access to hive> means normal

A) Create a table test

Hive>CREATE TABLE test (id int);

b) load the contents of the file in HDFs into the table

hive> Load Data inpath '/input/duplicate.txt ' into table test;

Note: The contents of Duplicate.txt can be found in previous blog posts

c) test averaging

hive> Select AVG (ID) from test;

Query ID = jimmy_20150607191924_ccfb231f-6c92-47ac-88f1-eb32882a0010

Total Jobs = 1

Launching Job 1 out of 1

Number of reduce tasks determined at compile Time:1

In order to change the average load for a reducer (in bytes):

Set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

Set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

Set mapreduce.job.reduces=<number>

Job running In-process (local Hadoop)

2015-06-07 19:19:27,980 Stage-1 map = 100%, reduce = 100%

Ended Job = job_local1537497991_0001

MapReduce Jobs Launched:

Stage-stage-1: HDFs read:190 hdfs write:0 SUCCESS

Total MapReduce CPU time spent:0 msec

3.909090909090909

Time taken:3.322 seconds, fetched:1 row (s)

From the output information, the hive underlying is still the Mr Engine that translates the SQL statement into a mapreduce job and submits it to Hadoop. Using SQL statements to analyze data from a usage level is really much more convenient than a mapreduce or pig approach.

Installation and configuration of Hadoop:hive 1.2.0 on Mac Machine

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More