(1), Hive framework construction and architecture introduction

Source: Internet
Author: User

First, Introduction

Hive is a Hadoop-based data warehousing tool that facilitates querying and managing datasets in distributed storage systems, ideal for statistical analysis of data warehouses

Hive is not suitable for the processing of connected machines, nor for real-time queries, and is better suited for batch jobs with large amounts of immutable data.

Second, download and install

1. Download the hive compression package and copy it to the/opt/module directory of the CentOS system

2. Extract files: tar-zxvf apache-hive-1.0.1-bin.tar.gz Execute rename folder as Hive

3, add hive-1.0.1 to the environment variables, the premise is that the operating environment of Hadoop has been configured, the operating environment is hadoop2.2

Vi/etc/profile Insert Content Export Hive_home=/opt/modules/hiveexport path= $PATH: $HIVE _home

4. Configuring Hive-default.xml and Hive-site.xml files

Go to/opt/modules/hive/conf, copy Hive-default.xml.template to Hive-default.xml and Hive-site.xml file, modify permissions of Hive-env.xml file

chmod u+x hive-env.sh

5. In the shell command line, enter: Hive will enter the shell command window of Hive

(In the configuration process, encountered a lot of problems, but according to the log log, can be a step-by-step solution to the problem)

Third, the structure

The architecture of hive can be divided into four parts

    • User interface
    • When the CLI, client, and WUI,CLI are started, a copy of Hive is started at the same time
    • Meta data storage
    • Metadata in hive is stored in the RDBMS, such as: Mysql,hive's metadata includes the name of the table, the columns of the table, the properties of the table, the directory where the table's data resides
    • In hive, for each database that is created in the database for a directory with the HDFs file system, the corresponding database directory is the subdirectory of the database directory.
    • Interpreter, compiler, optimizer
    • Using the HQL statement query, the resulting query information is stored in HDFs and executed by the MapReduce call from lexical analysis, parsing, compiling, optimization, and query plan generation.
    • Data storage
    • The data for hive is stored in HDFs, and most of the queries are interpreted as mapreduce tasks, and only a small portion of the files are read directly

The schema diagram is shown below.

Si

Iv. storing metadata in a MySQL database

A MySQL database installation

Yum install-y mysql-server MySQL Mysql-deve

B. Restart MySQL Service

Service mysqld Restart

C. Log in to MySQL and grant permissions

Mysql-u root-p * * *

Assign permissions: Grant all privileges on * * to ' root ' @ ' Hadoop-yarn ' identified by ' root123 ';

Refresh Permissions: Flush Privileges

D. Create a hive-specific metabase: "Hive"

Create DATABASE hive;

E. Add the following configuration to the Hive-site.xml file in the Conf directory of Hive

    <property>        <name>hive.metastore.local</name>        <value>true</value>    </property>    <property>        <name>javax.jdo.option.ConnectionURL</name>        < value>jdbc:mysql://hadoop-yarn:3306/hive?characterencoding=utf-8</value>    </property>    <property>        <name>javax.jdo.option.ConnectionDriverName</name>        <value> com.mysql.jdbc.driver</value>    </property>    <property>        <name> javax.jdo.option.connectionusername</name>        <value>root</value>    </property>    <property>        <name>javax.jdo.option.ConnectionPassword</name>        <value> Root123</value>    </property>

F. Copy the MySQL driver package to the Lib directory of the hive directory

G, start the Hive command as shown in

H. Enter MySQL database

Execute command: Use Hive

Show tables

This is where the hive environment is built and metadata metastore stored in the MySQL database.


(1), Hive framework construction and architecture introduction

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.