Build hive Platform

Source: Internet
Author: User

Hive is a hadoop-based data warehouse platform. With hive, we can easily perform ETL work. Hive defines a query language similar to SQL: hql, which can convert user-written QL into corresponding mapreduceProgramHadoop-based execution.

This article explains how to build a hive platform. Suppose we have three machines: hadoop1, hadoop2, and hadoop3. And the Hadoop-0.19.2 has been installed (many hadoop versions supported by hive), and the hosts file is correctly configured. Hive is deployed on hadoop1.

Simplest and fastest Deployment Solution

Hive files are self-contained in Hadoop-0.19.2. The version is 0.3.0.

First we start hadoop: Sh $ hadoop_home/bin/start-all.sh

Then start Hive: Sh $ hadoop_home/contrib/hive/bin/hive

At this time, our hive command line interface is started. You can directly enter the command to execute the corresponding hive application.

This deployment method uses the Derby embedded mode. Although it is simple and fast, it cannot provide simultaneous access by multiple users. Therefore, it can only be used for simple tests and cannot be used in production environments. Therefore, we need to modify the default hive configuration to improve availability.

Build multiple users and provide deployment solutions for web interfaces

Currently, only hive-0.4.1 is used. We will use this version to build the hive platform.

First download hive-0.4.1: SVN Co http://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.1/ hive-0.4.1

Then, modify and download the compilation option file shims/Ivy. XML to the following content (the corresponding hadoop version is 0.19.2)

<Ivy-module version = "2.0">
<Info Organization = "org. Apache. hadoop. Hive" module = "shims"/>
<Dependencies>
<Dependency org = "hadoop" name = "core" REV = "0.19.2">
<Artifact name = "hadoop" type = "Source" ext = "tar.gz"/>
</Dependency>
<Conflict manager = "all"/>
</Dependencies>
</Ivy-module>

Next, we use ant to compile Hive: ant package.

After compilation is successful, we will find that the successfully compiled file is in the build/Dist directory. Set this directory to $ hive_home.

Modify the conf/hive-default.xml file with the following main changes:

<Property>
<Name> javax. JDO. Option. connectionurl </Name>
<Value> JDBC: Derby: // hadoop1: 1527/metastore_db; Create = true </value>
<Description> JDBC connect string for a JDBC MetaStore </description>
</Property>
<Property>
<Name> javax. JDO. Option. connectiondrivername </Name>
<Value> org. Apache. Derby. JDBC. clientdriver </value>
<Description> driver class name for a JDBC MetaStore </description>
</Property>

Download and install Apache Derby database on hadoop1: wget http://labs.renren.com/apache-mirror/db/derby/db-derby-10.5.3.0/db-derby-10.5.3.0-bin.zip

Decompress Derby and set $ derby_home

Start Derby's network server: Sh $ derby_home/bin/startnetworkserver-H 0.0.0.0

Next, copy the derbyclient. jar and derbytools. Jar files under the $ derby_home/lib directory to the $ hive_home/lib directory.

Start hadoop: Sh $ hadoop_home/bin/start-all.sh

Finally, start the hive Web Interface: Sh $ hive_home/bin/hive -- service Hwi

In this way, our hive deployment is complete. You can directly enter http: // hadoop1: 9999/Hwi/in the browser to access the service. (if not, replace hadoop1 with the actual IP address, for example, http: // 10.210.152.17: 9999/Hwi /).

This deployment method uses the Derby C/S mode, which allows multiple users to access the database simultaneously and provides a Web interface for convenient use. This deployment scheme is recommended.

Follow hive Schema

In the above 2 Deployment Scheme, the Derby database is used to save the schema information in hive. We can also use other databases to save schema information, such as MySQL.

Refer to this articleArticleUnderstand if using MySQL to replace Derby: http://www.mazsoft.com/blog/post/2010/02/01/Setting-up-HadoopHive-to-use-MySQL-as-metastore.aspx

We can also use HDFS to save schema information, the specific approach is to modify the conf/hive-default.xml, the modification content is as follows:


hive. metaStore. rawstore. impl
Org. apache. hadoop. hive. metaStore. filestore
name of the class that implements Org. apache. hadoop. hive. metaStore. rawstore interface. this class is used to store and retrieval of raw metadata objects such as table, database

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.