First, Introduction
Hive is a Hadoop-based data warehousing tool that facilitates querying and managing datasets in distributed storage systems, ideal for statistical analysis of data warehouses
Hive is not suitable for the processing of connected machines, nor for real-time queries, and is better suited for batch jobs with large amounts of immutable data.
Second, download and install
1. Download the hive compression package and copy it to the/opt/module directory of the CentOS system
2. Extract files: tar-zxvf apache-hive-1.0.1-bin.tar.gz Execute rename folder as Hive
3, add hive-1.0.1 to the environment variables, the premise is that the operating environment of Hadoop has been configured, the operating environment is hadoop2.2
Vi/etc/profile Insert Content Export Hive_home=/opt/modules/hiveexport path= $PATH: $HIVE _home
4. Configuring Hive-default.xml and Hive-site.xml files
Go to/opt/modules/hive/conf, copy Hive-default.xml.template to Hive-default.xml and Hive-site.xml file, modify permissions of Hive-env.xml file
chmod u+x hive-env.sh
5. In the shell command line, enter: Hive will enter the shell command window of Hive
(In the configuration process, encountered a lot of problems, but according to the log log, can be a step-by-step solution to the problem)
Third, the structure
The architecture of hive can be divided into four parts
- User interface
- When the CLI, client, and WUI,CLI are started, a copy of Hive is started at the same time
- Meta data storage
- Metadata in hive is stored in the RDBMS, such as: Mysql,hive's metadata includes the name of the table, the columns of the table, the properties of the table, the directory where the table's data resides
- In hive, for each database that is created in the database for a directory with the HDFs file system, the corresponding database directory is the subdirectory of the database directory.
- Interpreter, compiler, optimizer
- Using the HQL statement query, the resulting query information is stored in HDFs and executed by the MapReduce call from lexical analysis, parsing, compiling, optimization, and query plan generation.
- Data storage
- The data for hive is stored in HDFs, and most of the queries are interpreted as mapreduce tasks, and only a small portion of the files are read directly
The schema diagram is shown below.
Si
Iv. storing metadata in a MySQL database
A MySQL database installation
Yum install-y mysql-server MySQL Mysql-deve
B. Restart MySQL Service
Service mysqld Restart
C. Log in to MySQL and grant permissions
Mysql-u root-p * * *
Assign permissions: Grant all privileges on * * to ' root ' @ ' Hadoop-yarn ' identified by ' root123 ';
Refresh Permissions: Flush Privileges
D. Create a hive-specific metabase: "Hive"
Create DATABASE hive;
E. Add the following configuration to the Hive-site.xml file in the Conf directory of Hive
<property> <name>hive.metastore.local</name> <value>true</value> </property> <property> <name>javax.jdo.option.ConnectionURL</name> < value>jdbc:mysql://hadoop-yarn:3306/hive?characterencoding=utf-8</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value> com.mysql.jdbc.driver</value> </property> <property> <name> javax.jdo.option.connectionusername</name> <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value> Root123</value> </property>
F. Copy the MySQL driver package to the Lib directory of the hive directory
G, start the Hive command as shown in
H. Enter MySQL database
Execute command: Use Hive
Show tables
This is where the hive environment is built and metadata metastore stored in the MySQL database.
(1), Hive framework construction and architecture introduction