Hive Architecture:
is the Data warehouse infrastructure built on top of Hadoop.
Similar to the database, except that the database focuses on some transactional operations, such as modify, delete, query, in the database this piece occurs more. The Data Warehouse is primarily focused on querying. For the same amount of data in the database query is relatively slow, in the Data Warehouse query efficiency is relatively fast.
The Data warehouse is query-oriented, and the amount of data processed is much higher than the amount of data processed by the database.
Traditional Data Warehouse products, still have the bottleneck of data storage, then in this bottleneck query speed is slow, then it is not applicable, our Hadoop is processing massive data, so we can build data warehouse.
The simplest difference between hive and MapReduce is that MapReduce is written in Java, and in the process of development, it needs to be written in the Java language, which is a bottleneck for many people.
Hive encapsulates a framework on top of Hadoop that can be queried using SQL.
The data in hive is imported from other database products, so he pulls the data out of the other database and translates it, because our database is a lot, and the business is very fragmented, but our data warehouse is generally oriented to a topic in a certain area, At this time, we have a table in the Data Warehouse to merge many tables in the database, and many tables need to be transformed when entering the Data Warehouse. Load the converted data into the Data warehouse.
One of the things that is often done in the data warehouse is the process of ETL (extract transform load). The query language for hive is very similar to SQL, but has its own characteristics, called HIVEQL (HQL).
The process of hive data conversion can be using hive itself, which can be mapreduce, or other things that can be used. Typically, MapReduce is used.
The data for the hive query is in HDFs.
HIVEQL's queries are converted to MapReduce when executed, and HDFs is executed via mapreduce. The optimization of hive looks at the efficiency of generating mapreduce operations.
Hive and HDFs Correspondence relationship:
A database in hive that corresponds to a folder in HDFs.
The columns in the Hive table correspond to the different fields of data in the data files in the folders in HDFs.
The data in the Hive table corresponds to the data in the data file in the folder in HDFs.
The correspondence between hive and HDFs is stored in one of the component Metastore of hive.
Metastore behaves as a relational database, either Derby or MySQL. That means our relationship is stored in derty or MySQL tables.
Hive and Hadoop together, need to have a mapping relationship, this mapping relationship is the execution of the Metastore,sql statement here side to be converted to the operation of the table, the operation of the folder, the operation of the file, And the operation of the columns of the data are implemented by querying the Metastore.
The files stored in HDFs, the data types are byte arrays, are inherently non-differentiated, but in a HQL statement you need to differentiate between types and the corresponding types.
Hive Installation:
Hive is the equivalent of a client, Hadoop is the service side, so our hive is going to be configured on a single machine in Hadoop.
cd/usr/local -zxvf hive-0.9.0. tar.gz Rename: MV Hive-0.9.0. tar.gz Hive into the hive configuration directory: CD Hive/conf mv Hive-exec-log4j.properties.templater hive-exec-log4j.properties mv Hive -log4j.properties.templater hive-log4j.properties mv Hive-env.sh.template hive-env.sh mv Hive-default. xml.template hive-default. XML CP Hive- Default. xml hive-site.xml
Delete all content inside Hive-site
Modify the hive-config.sh in the Bin directory to add the following three lines
Export java_home=/usr/local/JDK export hive_home=/usr/local/HIVE export hadoop_home=/ Usr/local/hadoop
Use of Hive:
CD bin/ performing hive display database: show databases; default ; Show tables in the database: show tables; int ); * FROM T1;
Mapping Relationships for Hive:
A database in hive that corresponds to a folder in HDFs.
The columns in the Hive table correspond to the different fields of data in the data files in the folders in HDFs.
The data in the Hive table corresponds to the data in the data file in the folder in HDFs.
When the hive deployment succeeds, the hive default database defaults to the corresponding directory on HDFs is/usr/hive/warehouse
Modify Hive's metastore to MySQL
installing MySQL on CentOS
Yum install mysql-server starts MySQL service mysqld start & -uroot into MySQL database Use MySQL; Query user's password select password, user from user; Modify the user password to update the username set password=password (' admin '); * * to ' root ' @ '% ' identified by ' admin '; Flush flush privileges; The previous operation takes effect immediately in the current reply-uroot-padmin
Put the MySQL JDBC driver into the Lib directory of Hive,
CP Mysql-connector-java-5.1.10/usr/local/hive/lib
Modify Hive/conf/hive-site.xml
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql:/ /hadoop4:3306/hive_hadoop?createdatabaseifnotexist=true</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value> com.mysql.jdbc.driver</value> </property> <property> <name> javax.jdo.option.connectionusername</name> <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value> Admin</value> </property>
Remove the metastore_db,derby.log below hive/bin/
Operation:
hive default; Show tables; int ); Quit; -uroot-padmin show databases; Use Hive_hadoop; SHWO tables;
Database on the Dbs:hive
TBLs: Table Information
COLUMNS_V2: Column Information
SELECT * from dbs; * from tbls \g; * FROM COLUMNS_V2;
Load the Linux disk file into the Hive table: the operation on Hive is actually an operation on HDFs, and the operation on HDFs allows only one write to be allowed, and the load data is from the disk file.
VI onecolumn 1 2 3 4 5
VI onecolumn Data
Configure hive into an environment variable:
VI ~/. BASHRC export PATH=/usr/local/hive/bin: $PATH ~/. BASHRC hive '. Onecolumn ' into table T1; * from T1; query drop table t1, delete tables not modified in hive
add hive to an environment variable
Uploading multiple columns of data:
int, name string) row format delimited fields terminated by ' \ t '; VI User 1 zhangsan 2 Lisi 3 Wangwu './user ' into table ' T2 '; * from T2; Select name from T2;
When you need to query a column, hive is not able to scan the whole table, only to go to the MapReduce
To create a table:
int ); int, name string) row format delimited fields terminated by ' \ t ';
Load the Linux disk file into the hive table:
Load data local inpath './onecolumn ' into table t1; '. /user ' into table T2;
Hive architecture, installation of hive and installation of MySQL, and some simple use of hive