Installation and configuration of Hadoop:hive 1.2.0 on Mac Machine

Source: Internet
Author: User
Tags log4j

Environment: Mac OS X Yosemite + Hadoop 2.6.0 + hive 1.2.0 + jdk 1.7.0_79

Prerequisite: Hadoop must be installed and running (either pseudo-fractional or full-distributed mode)

Hive Website Address: http://hive.apache.org/

Recommendation: By personal practice, in the Mac OS X Yosemite Environment, if you use the original Hadoop 2.6.0 downloaded by Apache, regardless of what version of the JDK is installed (1.6\1.7\1.8 have tried), Hive 1.2.0 starts, always reported that the JDK version does not match, Later, when the Hadoop 2.6.0 source was compiled into a native version of Mac on Mac, it was normal.

If a friend encounters a similar situation, refer to compiling Hadoop 2.6.0/2.7.0 and Tez 0.5.2/0.7.0 considerations on Mac OS X Yosemite

I. Environment variables

... export Hadoop_home=/home/hadoop/hadoop-2.6.0export Hive_home=/home/hadoop/hive-1.2.0...export HADOOP_CONF_DIR= ${hadoop_home}/etc/hadoop...export path=${hive_home}/bin: $PATH: $HOME/bin:

Third, modify the XML configuration in hive

CP Hive-default.xml.template Hive-default.xml

CP Hive-default.xml.template Hive-site.xml

CP Hive-exec-log4j.properties.template Hive-exec-log4j.properties

CP Hive-log4j.properties.template Hive-log4j.properties

CP Beeline-log4j.properties.template Beeline-log4j.properties

That is: A few template files with a. Template suffix, copying a copy into a configuration file without. Template, note hive-default.xml.template this to copy two copies, one is Hive-default.xml, the other is Hive-site.xml, which Hive-site.xml is a user-defined configuration, Hive-default.xml is a global configuration, and when Hive starts,-site.xml custom configuration overrides the same configuration item-default.xml the global configuration.

1 <?XML version= "1.0" encoding= "UTF-8" standalone= "no"?>2 3 <Configuration>4 5     < Property>6         <name>Hive.metastore.local</name>7         <value>True</value>8     </ Property>9     Ten    <!--  One <property> A <name>javax.jdo.option.ConnectionURL</name> - <value>jdbc:postgresql://localhost:5432/hive</value> - </property> the      - <property> - <name>javax.jdo.option.ConnectionDriverName</name> - <value>org.postgresql.Driver</value> + </property> -      - +     A     at     < Property> -         <name>Javax.jdo.option.ConnectionURL</name> -         <value>Jdbc:mysql://127.0.0.1:3306/hive?characterencoding=utf-8</value> -     </ Property> -      -     < Property> in         <name>Javax.jdo.option.ConnectionDriverName</name> -         <value>Com.mysql.jdbc.Driver</value> to     </ Property> +      -     < Property> the         <name>Javax.jdo.option.ConnectionUserName</name> *         <value>Hive</value> $     </ Property>Panax Notoginseng      -     < Property> the         <name>Javax.jdo.option.ConnectionPassword</name> +         <value>Hive</value> A     </ Property> the      +     -     < Property> $         <name>Hive.exec.scratchdir</name> $         <value>/tmp/hive</value> -     </ Property> -      the     < Property> -         <name>Hive.exec.local.scratchdir</name>Wuyi         <value>/users/jimmy/app/hive-1.2.0/tmp</value> the     </ Property> -  Wu     < Property> -         <name>Hive.downloaded.resources.dir</name> About         <value>/users/jimmy/app/hive-1.2.0/tmp/${hive.session.id}_resources</value> $     </ Property> -  -     < Property> -         <name>Hive.metastore.warehouse.dir</name> A         <value>/user/hive/warehouse</value>     +     </ Property> the  - </Configuration>

Note: There is a concept of metadata in hive, metadata records which tables are currently in, which fields, field data types, and so on, because HDFs does not have this additional information, so hive needs to use the traditional database to record these metadata information, by default, the built-in database Derby to record, It can also be configured to record these metadata into large rdmbs such as Mssql\mysql\oracle\postgresql, the above configuration shows MySQL, PostgreSQL two configuration, if the 23-41 commented out, it becomes the Derby independent mode.

Another: In the above configuration file, there are some parameters about the directory, first put the directory in advance,

Hive.exec.local.scratchdir
Hive.downloaded.resources.dir

These two items correspond to the local directory (must be manually built first), the other directory is in the HDFs directory (hive startup, first automatically built, if the automatic creation fails, you can also manually through the shell in HDFs created)

Iv. replacing the JLine jar package in Hadoop 2.6.0

Since the JLine package that comes with HIVE 1.2.0 is inconsistent with the version of HADOOP 2.6.0, you need toreplace the $hive_home/lib/jline-2.12.jar this file $hadoop_home/ Share/hadoop/yarn/lib the original version (that is, delete the old version, copy the new version to this directory), or hive boot will fail

V. Testing and Verification

$HIVE _home/bin/hive

If normal access to hive> means normal

A) Create a table test

Hive>CREATE TABLE test (id int);

b) load the contents of the file in HDFs into the table

hive> Load Data inpath '/input/duplicate.txt ' into table test;

Note: The contents of Duplicate.txt can be found in previous blog posts

c) test averaging

hive> Select AVG (ID) from test;

Query ID = jimmy_20150607191924_ccfb231f-6c92-47ac-88f1-eb32882a0010

Total Jobs = 1

Launching Job 1 out of 1

Number of reduce tasks determined at compile Time:1

In order to change the average load for a reducer (in bytes):

Set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

Set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

Set mapreduce.job.reduces=<number>

Job running In-process (local Hadoop)

2015-06-07 19:19:27,980 Stage-1 map = 100%, reduce = 100%

Ended Job = job_local1537497991_0001

MapReduce Jobs Launched:

Stage-stage-1: HDFs read:190 hdfs write:0 SUCCESS

Total MapReduce CPU time spent:0 msec

Ok

3.909090909090909

Time taken:3.322 seconds, fetched:1 row (s)

From the output information, the hive underlying is still the Mr Engine that translates the SQL statement into a mapreduce job and submits it to Hadoop. Using SQL statements to analyze data from a usage level is really much more convenient than a mapreduce or pig approach.

Installation and configuration of Hadoop:hive 1.2.0 on Mac Machine

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.