Build hive Platform

Last Update:2018-12-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hive is a hadoop-based data warehouse platform. With hive, we can easily perform ETL work. Hive defines a query language similar to SQL: hql, which can convert user-written QL into corresponding mapreduceProgramHadoop-based execution.

This article explains how to build a hive platform. Suppose we have three machines: hadoop1, hadoop2, and hadoop3. And the Hadoop-0.19.2 has been installed (many hadoop versions supported by hive), and the hosts file is correctly configured. Hive is deployed on hadoop1.

Simplest and fastest Deployment Solution

Hive files are self-contained in Hadoop-0.19.2. The version is 0.3.0.

First we start hadoop: Sh $ hadoop_home/bin/start-all.sh

Then start Hive: Sh $ hadoop_home/contrib/hive/bin/hive

At this time, our hive command line interface is started. You can directly enter the command to execute the corresponding hive application.

This deployment method uses the Derby embedded mode. Although it is simple and fast, it cannot provide simultaneous access by multiple users. Therefore, it can only be used for simple tests and cannot be used in production environments. Therefore, we need to modify the default hive configuration to improve availability.

Build multiple users and provide deployment solutions for web interfaces

Currently, only hive-0.4.1 is used. We will use this version to build the hive platform.

First download hive-0.4.1: SVN Co http://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.1/ hive-0.4.1

Then, modify and download the compilation option file shims/Ivy. XML to the following content (the corresponding hadoop version is 0.19.2)

<Ivy-module version = "2.0">
<Info Organization = "org. Apache. hadoop. Hive" module = "shims"/>
<Dependencies>
<Dependency org = "hadoop" name = "core" REV = "0.19.2">
<Artifact name = "hadoop" type = "Source" ext = "tar.gz"/>
</Dependency>
<Conflict manager = "all"/>
</Dependencies>
</Ivy-module>

Next, we use ant to compile Hive: ant package.

After compilation is successful, we will find that the successfully compiled file is in the build/Dist directory. Set this directory to $ hive_home.

Modify the conf/hive-default.xml file with the following main changes:

<Property>
<Name> javax. JDO. Option. connectionurl </Name>
<Value> JDBC: Derby: // hadoop1: 1527/metastore_db; Create = true </value>
<Description> JDBC connect string for a JDBC MetaStore </description>
</Property>
<Property>
<Name> javax. JDO. Option. connectiondrivername </Name>
<Value> org. Apache. Derby. JDBC. clientdriver </value>
<Description> driver class name for a JDBC MetaStore </description>
</Property>

Download and install Apache Derby database on hadoop1: wget http://labs.renren.com/apache-mirror/db/derby/db-derby-10.5.3.0/db-derby-10.5.3.0-bin.zip

Decompress Derby and set $ derby_home

Start Derby's network server: Sh $ derby_home/bin/startnetworkserver-H 0.0.0.0

Next, copy the derbyclient. jar and derbytools. Jar files under the $ derby_home/lib directory to the $ hive_home/lib directory.

Start hadoop: Sh $ hadoop_home/bin/start-all.sh

Finally, start the hive Web Interface: Sh $ hive_home/bin/hive -- service Hwi

In this way, our hive deployment is complete. You can directly enter http: // hadoop1: 9999/Hwi/in the browser to access the service. (if not, replace hadoop1 with the actual IP address, for example, http: // 10.210.152.17: 9999/Hwi /).

This deployment method uses the Derby C/S mode, which allows multiple users to access the database simultaneously and provides a Web interface for convenient use. This deployment scheme is recommended.

Follow hive Schema

In the above 2 Deployment Scheme, the Derby database is used to save the schema information in hive. We can also use other databases to save schema information, such as MySQL.

Refer to this articleArticleUnderstand if using MySQL to replace Derby: http://www.mazsoft.com/blog/post/2010/02/01/Setting-up-HadoopHive-to-use-MySQL-as-metastore.aspx

We can also use HDFS to save schema information, the specific approach is to modify the conf/hive-default.xml, the modification content is as follows:

hive. metaStore. rawstore. impl
Org. apache. hadoop. hive. metaStore. filestore
name of the class that implements Org. apache. hadoop. hive. metaStore. rawstore interface. this class is used to store and retrieval of raw metadata objects such as table, database

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Build hive Platform

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Build hive Platform

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support