Hadoop hive installation, configuring MySQL metabase

Source: Internet
Author: User
Tags apache download chmod error handling metabase hadoop fs

Since Hive relies on Hadoop, you must confirm that Hadoop is available before installing hive, and the installation of Hadoop can refer to the cluster distributed Hadoop installation detailed steps, no longer described here.

1. Download the Hive installation package
As: Http://www.apache.org/dyn/closer.cgi/hive (or click the link below), select a stable version, assuming that the download version is: apache-hive-1.0.1-bin.tar.gz, unzip:

-zxvf apache-hive-1.0.1-bin.tar.gz

In the Apache download list you can see the hive-1.0.1.src.tar.gz and hive-hive-1.0.1.bin.tar.gz two packages, with bin in the name of the package contains only the compiled hive program, not including hive source code.

2. Configure Environment variables
In fact, even though the operating system environment variables are not configured, hive is still available, but if the full path is inefficient every time, it is recommended that the environment variables for the Linux operating system be configured as follows.
To modify the global profile/etc/profile or the private file ~/.BASHRC under the user directory, add the following information to the file:

export HIVE_HOME=/home/hadoop/apache-hive-1.0.1-binexport PATH=$HIVE_HOME/bin:$HIVE_HOME/conf:$PATH

In order for the configuration to take effect immediately without restarting the system or logging back in, run the following command:

source /etc/profile 或source ~/.bashrc

3. Create a hive data file directory
Create a file directory in HDFs for storing hive data (the/tmp directory may already exist):

hadoop fs -mkdir /tmphadoop fs -mkdir /user/hive/warehousehadoop fs -chmod777 /tmphadoop fs -chmod777 /user/hive/warehouse

The above command establishes the/TMP and/usr/hive/warehouse directories in HDFs, where/TMP is primarily used to hold some temporary files during execution, and/user/hive/warehouse is used to store data files that are managed by hive.

4. Modify Hive configuration file
This step is not required, and if not configured, hive will use the default configuration file, which can be customized and optimized with hive configuration files. The most common configuration for the metadata storage tier is that Hive uses the Derby database as the metadata storage tier by default.
In Hive, Derby is started by default in "single user" mode, which means that only one user can use hive at a time, which is suitable for local testing when developing a program.
The Hive configuration file is located under the $hive_home/conf directory, Named Hive-site.xml, this file does not exist by default, it needs to be created manually, in this directory there is a hive-default.xml.template template file, first you need to create a hive-site.xml file through it.

cp hive-default.xml.template hive-site.xml

The default configuration for metabase Dergy is as follows:

<!--JDBC Metadata Warehouse connection string --  <property >    <name>Javax.jdo.option.ConnectionURL</name>    <value>Jdbc:derby:;d atabasename=metastore_db;create=true</value>    <description>JDBC connect string for a JDBC Metastore</Description>  </Property >  <!--JDBC Metadata warehouse driver class name --  <property >    <name>Javax.jdo.option.ConnectionDriverName</name>    <value>Org.apache.derby.jdbc.EmbeddedDriver</value>    <description>Driver class name for a JDBC metastore</Description>  </Property >  <!--metadata Warehouse user name --  <property >    <name>Javax.jdo.option.ConnectionUserName</name>    <value>APP</value>    <description>Username to use against Metastore database</Description>  </Property >   <!--metadata Warehouse password --  <property >    <name>Javax.jdo.option.ConnectionPassword</name>    <value>Mine</value>    <description>Password to use against Metastore database</Description>  </Property >

From the configuration above you can see the configuration of the metabase, because the built-in Derby database is already included in Hive, so there is no need to install the database, and the $hive_home/ The database driver package (Derby-xx.x.x.x.jar) for Derby can also be seen under Lib. Now that you have completed the installation of the hive work, you can test whether hive is working correctly with the following command:

hivehive>SET-v;hive> quit;

Error handling: Exception in thread "main" Java.lang.RuntimeException:java. Lang. IllegalArgumentException:java.net.URISyntaxException:Relative path in absolute URI:
${system:java.io.tmpdir%7d/$%7bsystem:user.name%7d
At Org.apache.hadoop.hive.ql.session.SessionState.start (sessionstate.java:444)
At Org.apache.hadoop.hive.cli.CliDriver.run (clidriver.java:672)
At Org.apache.hadoop.hive.cli.CliDriver.main (clidriver.java:616)
At Sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
At Sun.reflect.NativeMethodAccessorImpl.invoke (nativemethodaccessorimpl.java:57)
At Sun.reflect.DelegatingMethodAccessorImpl.invoke (delegatingmethodaccessorimpl.java:43)
At Java.lang.reflect.Method.invoke (method.java:606)
At Org.apache.hadoop.util.RunJar.main (runjar.java:160)
caused by:java.lang.IllegalArgumentException:java.net.URISyntaxException:Relative path in absolute URI:
Processing:
To create a new configuration item in Hive-site.xml, the Iotmp folder is new:

<property>
<name>system:java.io.tmpdir</name>
<value>/home/hadoop/hive-1.0.1/iotmp</value>
<description/>
</property>

here is the configuration MySQL metabase, replacing the Derby

As mentioned earlier, Hive uses the built-in Derty database to store metadata by default, which is not a problem for local testing during program development. However, in a production environment, this may not meet the application requirements because of the need to support simultaneous system access for multiple users. Configuration allows Derty to run as a "multiuser" mode to meet multi-user access requirements. Further, in a real production environment, the MySQL database with a more powerful storage function is often used as a "metadata storage layer". MySQL as the most popular open source relational database, the use of wide-ranging, functional diversity, if necessary, can serve as a temporary standard data query and analysis system use, so it is favored by a large number of hive users.
If you use MySQL as the "metadata storage Tier", you first need to install MySQL, you can use the following command:

#sudo apt-get install mysql-server

After installation, create a hive account in the database and set permissions:

create‘hive‘@‘%‘by‘hive‘onto‘hive‘@‘%‘with grant option;mysql> flush privileges;

Next, you need to make the following modifications to the Hive-sive.xml configuration file to support MySQL:

<property >  <name>Javax.jdo.option.ConnectionURL</name>  <value>Jdbc:mysql://localhost:3306/metastore_db?createdatabaseifnotexist=true</value>  <description>JDBC connect string for a JDBC Metastore</Description></Property ><property >  <name>Javax.jdo.option.ConnectionDriverName</name>  <value>Com.mysql.jdbc.Driver</value>  <description>Driver class name for a JDBC metastore</Description></Property ><property >  <name>Javax.jdo.option.ConnectionUserName</name>  <value>Hive</value>  <description>Username to use against Metastore database</Description></Property ><property >  <name>Javax.jdo.option.ConnectionPassword</name>  <value>Hive</value>  <description>Password to use against Metastore database</Description></Property >

In addition, because Hive does not contain the JDBC driver for MySQL by default, the Mysql-connector-java-x.x.xx.jar file needs to be copied to the $hive_home/lib directory, or hive will not be able to communicate with MySQL. At this point, the configuration of the hive system based on MySQL as the "metadata storage Tier" is complete. (The jar package download needs to register Oracle, need to leave a message)

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Hadoop hive installation, configuring MySQL metabase

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.