Hive is a hadoop-based data warehouse platform. With hive, we can easily perform ETL work. Hive defines a query language similar to SQL: hql, which can convert user-written QL into corresponding mapreduceProgramHadoop-based execution.
This article explains how to build a hive platform. Suppose we have three machines: hadoop1, hadoop2, and hadoop3. And the Hadoop-0.19.2 has been installed (many hadoop versions supported by hive), and the hosts file is correctly configured. Hive is deployed on hadoop1.
Simplest and fastest Deployment Solution
Hive files are self-contained in Hadoop-0.19.2. The version is 0.3.0.
First we start hadoop: Sh $ hadoop_home/bin/start-all.sh
Then start Hive: Sh $ hadoop_home/contrib/hive/bin/hive
At this time, our hive command line interface is started. You can directly enter the command to execute the corresponding hive application.
This deployment method uses the Derby embedded mode. Although it is simple and fast, it cannot provide simultaneous access by multiple users. Therefore, it can only be used for simple tests and cannot be used in production environments. Therefore, we need to modify the default hive configuration to improve availability.
Build multiple users and provide deployment solutions for web interfaces
Currently, only hive-0.4.1 is used. We will use this version to build the hive platform.
First download hive-0.4.1: SVN Co http://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.1/ hive-0.4.1
Then, modify and download the compilation option file shims/Ivy. XML to the following content (the corresponding hadoop version is 0.19.2)
<Ivy-module version = "2.0">
<Info Organization = "org. Apache. hadoop. Hive" module = "shims"/>
<Dependencies>
<Dependency org = "hadoop" name = "core" REV = "0.19.2">
<Artifact name = "hadoop" type = "Source" ext = "tar.gz"/>
</Dependency>
<Conflict manager = "all"/>
</Dependencies>
</Ivy-module>
Next, we use ant to compile Hive: ant package.
After compilation is successful, we will find that the successfully compiled file is in the build/Dist directory. Set this directory to $ hive_home.
Modify the conf/hive-default.xml file with the following main changes:
<Property>
<Name> javax. JDO. Option. connectionurl </Name>
<Value> JDBC: Derby: // hadoop1: 1527/metastore_db; Create = true </value>
<Description> JDBC connect string for a JDBC MetaStore </description>
</Property>
<Property>
<Name> javax. JDO. Option. connectiondrivername </Name>
<Value> org. Apache. Derby. JDBC. clientdriver </value>
<Description> driver class name for a JDBC MetaStore </description>
</Property>
Download and install Apache Derby database on hadoop1: wget http://labs.renren.com/apache-mirror/db/derby/db-derby-10.5.3.0/db-derby-10.5.3.0-bin.zip
Decompress Derby and set $ derby_home
Start Derby's network server: Sh $ derby_home/bin/startnetworkserver-H 0.0.0.0
Next, copy the derbyclient. jar and derbytools. Jar files under the $ derby_home/lib directory to the $ hive_home/lib directory.
Start hadoop: Sh $ hadoop_home/bin/start-all.sh
Finally, start the hive Web Interface: Sh $ hive_home/bin/hive -- service Hwi
In this way, our hive deployment is complete. You can directly enter http: // hadoop1: 9999/Hwi/in the browser to access the service. (if not, replace hadoop1 with the actual IP address, for example, http: // 10.210.152.17: 9999/Hwi /).
This deployment method uses the Derby C/S mode, which allows multiple users to access the database simultaneously and provides a Web interface for convenient use. This deployment scheme is recommended.
Follow hive Schema
In the above 2 Deployment Scheme, the Derby database is used to save the schema information in hive. We can also use other databases to save schema information, such as MySQL.
Refer to this articleArticleUnderstand if using MySQL to replace Derby: http://www.mazsoft.com/blog/post/2010/02/01/Setting-up-HadoopHive-to-use-MySQL-as-metastore.aspx
We can also use HDFS to save schema information, the specific approach is to modify the conf/hive-default.xml, the modification content is as follows:
hive. metaStore. rawstore. impl
Org. apache. hadoop. hive. metaStore. filestore
name of the class that implements Org. apache. hadoop. hive. metaStore. rawstore interface. this class is used to store and retrieval of raw metadata objects such as table, database