Yum installation CDH5.5 hive, Impala process detailed _linux

Source: Internet
Author: User
Tags ldap mkdir zookeeper create database ssl certificate hadoop fs

I. Installation of Hive

The components are arranged as follows:

172.16.57.75 bd-ops-test-75 mysql-server
172.16.57.77 bd-ops-test-77 Hiveserver2

1. Install Hive

Install the Hive on 77:

# Yum Install hive Hive-metastore hive-server2 hive-jdbc hive-hbase-y

You can install the client on another node:

# Yum Install hive Hive-server2 hive-jdbc hive-hbase-y

2. Install MySQL

Yum install MySQL:

# yum install MySQL mysql-devel mysql-server mysql-libs-y

To start a database:

# Configure startup
# Chkconfig mysqld on
# service mysqld start

To install the JDBC driver:

# yum Install Mysql-connector-java
# ln-s/usr/share/java/mysql-connector-java.jar/usr/lib/hive/lib/ Mysql-connector-java.jar

Set MySQL initial password to bigdata:

# mysqladmin-uroot password ' bigdata '

After entering the database, you perform the following:

CREATE DATABASE Metastore;
Use Metastore;
Source/usr/lib/hive/scripts/metastore/upgrade/mysql/hive-schema-1.1.0.mysql.sql;
CREATE USER ' hive ' @ ' localhost ' identified by ' hive ';
GRANT all privileges in metastore.* to ' hive ' @ ' localhost ';
GRANT all privileges in metastore.* to ' hive ' @ '% ';
FLUSH privileges;

Note: The user created is hive, the password is hive, you can modify it according to your own needs.

Modify the following contents of the Hive-site.xml file:

<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql:// 172.16.57.75:3306/metastore?useunicode=true&characterencoding=utf-8</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value> Com.mysql.jdbc.driver</value>
</property>

3. Configure Hive

Modify/etc/hadoop/conf/hadoop-env.sh, add environment variable hadoop_mapred_home, if not added, then the unkown RPC TYPE exception occurs when you run MapReduce using yarn

Export Hadoop_mapred_home=/usr/lib/hadoop-mapreduce

To create a hive Data Warehouse directory in HDFs:

The Hive Data Warehouse defaults to/user/hive/warehouse in HDFs, and it is recommended that you modify its access to 1777 so that all other users can create and access the table, but you cannot delete the table that does not belong to him.

Each query hive user must have a HDFs home directory (/user directory, as root for/user/root)
The/TMP of the Hive node must be world-writable permissions.

To create a directory and set permissions:

# sudo-u HDFs Hadoop fs-mkdir/user/hive
# sudo-u HDFs hadoop fs-chown hive/user/hive
# sudo-u HDFs Hadoop F S-mkdir/user/hive/warehouse
# sudo-u HDFs hadoop fs-chmod 1777/user/hive/warehouse
# sudo-u HDFs Hadoop FS- Chown Hive/user/hive/warehouse

Modify HIVE-ENV Set JDK environment variables:

# vim/etc/hive/conf/hive-env.sh
Export java_home=/opt/programs/jdk1.7.0_67

Start Hive-server and Metastore:

# service Hive-metastore start
# service Hive-server2 start

4, testing

$ Hive-e ' CREATE TABLE t (id int); '
$ Hive-e ' select * from T limit 2; '
$ Hive-e ' select ID from t; '

Visit Beeline:

$ beeline
beeline>!connect jdbc:hive2://localhost:10000;

5. Integration with HBase

Install Hive-hbase First:

# yum Install Hive-hbase-y

If you are using the CDH4, you need to add the jar in the hive shell by executing the following command:

$ ADD Jar/usr/lib/hive/lib/zookeeper.jar;
$ ADD Jar/usr/lib/hive/lib/hbase.jar;
the version of the $ ADD Jar/usr/lib/hive/lib/hive-hbase-handler- 
 

If you are using the CDH5, you need to add the jar in the hive shell by executing the following command:

ADD Jar/usr/lib/hive/lib/zookeeper.jar;
ADD Jar/usr/lib/hive/lib/hive-hbase-handler.jar;
ADD Jar/usr/lib/hbase/lib/guava-12.0.1.jar;
ADD Jar/usr/lib/hbase/hbase-client.jar;
ADD Jar/usr/lib/hbase/hbase-common.jar;
ADD Jar/usr/lib/hbase/hbase-hadoop-compat.jar;
ADD Jar/usr/lib/hbase/hbase-hadoop2-compat.jar;
ADD Jar/usr/lib/hbase/hbase-protocol.jar;
ADD Jar/usr/lib/hbase/hbase-server.jar;

You can also configure it in Hive-site.xml by Hive.aux.jars.path parameters, or you can set it through export hive_aux_jars_path= in hive-env.sh.

Second, install Impala

Similar to hive, Impala can directly interact directly with HDFs and hbase libraries. Only hive and other frameworks built on MapReduce are suitable for batch tasks that require a long running time. For example: Those who bulk extract, transform, load (ETL) type of job, and Impala mainly for real-time query.

The components are assigned as follows:

172.16.57.74 bd-ops-test-74 impala-state-store impala-catalog impala-server 172.16.57.75 
Impala-server
172.16.57.76 bd-ops-test-76 impala-server
172.16.57.77 bd-ops-test-77

1, installation

Install at 74 node:

Yum Install Impala-state-store Impala-catalog impala-server-y

Install on 75, 76, 77 nodes:

Yum Install Impala-server-y

2, configuration

2.1 Modifying the configuration file

To view the installation path:

# Find/-name Impala
/var/run/impala
/var/lib/alternatives/impala
/var/log/impala
/usr/lib/ Impala
/etc/alternatives/impala
/etc/default/impala
/etc/impala
/etc/default/impala

The configuration file path for the

Impalad is specified by the environment variable Impala_conf_dir, and the default is/usr/lib/impala/conf,impala's default configuration is/etc/default/impala, modifying the Impala_ in the file Catalog_service_host and Impala_state_store_host

impala_catalog_service_host=bd-ops-test-74 impala_state_store_host=bd-ops-test-74 IMPALA_STATE_STORE_PORT=24000 impala_backend_port=22000 Impala_log_dir=/var/log/impala impala_catalog_args= "-log_dir=${IMPALA_LOG_DIR}-sentry_ Config=/etc/impala/conf/sentry-site.xml "impala_state_store_args="-log_dir=${impala_log_dir}-state_store_port=$ {Impala_state_store_port} "impala_server_args=" \-log_dir=${impala_log_dir} \-use_local_tz_for_unix_timestamp_ conversions=true \-convert_legacy_hive_parquet_utc_timestamps=true \-catalog_service_host=${impala_catalog_ Service_host} \-state_store_port=${impala_state_store_port} \-use_statestore \-state_store_host=${impala_state_ Store_host} \-be_port=${impala_backend_port} \-server_name=server1\-sentry_config=/etc/impala/conf/ Sentry-site.xml "Enable_core_dumps=false # libhdfs_opts=-djava.library.path=/usr/lib/impala/lib # MYSQL_CONNECTOR_
Jar=/usr/share/java/mysql-connector-java.jar # impala_bin=/usr/lib/impala/sbin # IMPALA_HOME=/usr/lib/impala# hive_home=/usr/lib/hive # hbase_home=/usr/lib/hbase # impala_conf_dir=/etc/impala/conf # HADOOP_CONF_DIR=/etc/ impala/conf # hive_conf_dir=/etc/impala/conf # hbase_conf_dir=/etc/impala/conf

Set the maximum memory that Impala can use: Add-mem_limit=70% after the Impala_server_args parameter value above.

If you need to set the maximum number of requests per queue in Impala, you need to add-default_pool_max_requests=-1 after the Impala_server_args parameter value above, which sets the maximum number of requests per queue, if-1, Indicates that no restrictions are made.

On node 74, create the Hive-site.xml, Core-site.xml, Hdfs-site.xml soft links to the/etc/impala/conf directory and make the following modifications to add the following in the Hdfs-site.xml file:

<property>
<name>dfs.client.read.shortcircuit</name>
<value>true</value>
</property>
<property>
<name>dfs.domain.socket.path</name>
<value >/var/run/hadoop-hdfs/dn._PORT</value>
</property>
<property>
<name> dfs.datanode.hdfs-blocks-metadata.enabled</name>
<value>true</value>
</property >

Synchronize the files above to other nodes.

2.2 Creating the socket Path

Create/var/run/hadoop-hdfs on each node:

# mkdir-p/var/run/hadoop-hdfs

2.3 User Requirements

Impala the user and group named Impala are created during the installation process, do not delete the user and group.

If you want Impala and YARN to collaborate with llama, you need to add Impala users to the HDFs group.

Impala when performing a DROP TABLE operation, you need to move the file to the HDFs Recycle Bin, so you need to create a HDFs directory/user/impala and set it to the Impala user to write. Similarly, Impala needs to read the data in the Hive Data Warehouse, so the Impala users need to be added to the Hive group.

Impala cannot run as root because the root user does not allow direct read.

Create Impala user home directory and set permissions:

Sudo-u HDFs Hadoop fs-mkdir/user/impala
sudo-u hdfs Hadoop fs-chown

To view the groups to which the Impala user belongs:

# groups Impala
Impala:impala Hadoop HDFs Hive

From the above, Impala users belong to Imapal, Hadoop, HDFs, hive user group.

2.4 Start Service

Start at 74 node:

# service Impala-state-store start
# service Impala-catalog start

2.5 using Impala-shell

Use Impala-shell to start the Impala Shell, connect 74, and refresh the meta data

#impala-shell 
starting Impala shell without Kerberos authentication
Connected to bd-dev-hadoop-70:21000
Server Version:impalad version 2.3.0-cdh5.5.1 release (build 73bf5bc5afbb47aa7eab06cfbf6023ba8cb74f3c)
*********
Welcome to the Impala shell. Copyright (c) 2015 Cloudera, Inc. All rights reserved.
(Impala Shell v2.3.0-cdh5.5.1 (73BF5BC) built on Wed DEC 2 10:39:33 PST 2015)
After running a query, the type SUMMARY to the SUMMARY of where time is spent.
[bd-dev-hadoop-70 : 21000] > Invalidate metadata;

When you first start Impala-shell when you create a table in Hive, execute the INVALIDATE METADATA statement first to Impala identify the newly created table (in Impala 1.2 and above, you only need to run on one node Invalida TE METADATA, rather than running on all Impala nodes.

You can also add some other parameters to see what parameters are available:

#impala-shell-h Usage:impala_shell.py [Options] Options:-H,--help show this Help and Exit-i Impalad,--impal Ad=impalad  

To export data using Impala:

Impala-shell-i ' 172.16.57.74:21000 '-r-q ' select * FROM Test "-B--output_delimiter=" \ T "-O Result.txt

The above is a small set to introduce the Yum installation CDH5.5 hive, Impala process detailed, I hope to help you, if you have any questions please give me a message, small series will promptly reply to everyone. Here also thank you very much for the cloud Habitat Community website support!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.