Hive architecture, installation of hive and installation of MySQL, and some simple use of hive

Source: Internet
Author: User
Tags log4j

Hive Architecture:

is the Data warehouse infrastructure built on top of Hadoop.

Similar to the database, except that the database focuses on some transactional operations, such as modify, delete, query, in the database this piece occurs more. The Data Warehouse is primarily focused on querying. For the same amount of data in the database query is relatively slow, in the Data Warehouse query efficiency is relatively fast.

The Data warehouse is query-oriented, and the amount of data processed is much higher than the amount of data processed by the database.
Traditional Data Warehouse products, still have the bottleneck of data storage, then in this bottleneck query speed is slow, then it is not applicable, our Hadoop is processing massive data, so we can build data warehouse.
The simplest difference between hive and MapReduce is that MapReduce is written in Java, and in the process of development, it needs to be written in the Java language, which is a bottleneck for many people.
Hive encapsulates a framework on top of Hadoop that can be queried using SQL.

The data in hive is imported from other database products, so he pulls the data out of the other database and translates it, because our database is a lot, and the business is very fragmented, but our data warehouse is generally oriented to a topic in a certain area, At this time, we have a table in the Data Warehouse to merge many tables in the database, and many tables need to be transformed when entering the Data Warehouse. Load the converted data into the Data warehouse.
One of the things that is often done in the data warehouse is the process of ETL (extract transform load). The query language for hive is very similar to SQL, but has its own characteristics, called HIVEQL (HQL).
The process of hive data conversion can be using hive itself, which can be mapreduce, or other things that can be used. Typically, MapReduce is used.
The data for the hive query is in HDFs.
HIVEQL's queries are converted to MapReduce when executed, and HDFs is executed via mapreduce. The optimization of hive looks at the efficiency of generating mapreduce operations.

Hive and HDFs Correspondence relationship:

A database in hive that corresponds to a folder in HDFs.
The columns in the Hive table correspond to the different fields of data in the data files in the folders in HDFs.
The data in the Hive table corresponds to the data in the data file in the folder in HDFs.
The correspondence between hive and HDFs is stored in one of the component Metastore of hive.

Metastore behaves as a relational database, either Derby or MySQL. That means our relationship is stored in derty or MySQL tables.
Hive and Hadoop together, need to have a mapping relationship, this mapping relationship is the execution of the Metastore,sql statement here side to be converted to the operation of the table, the operation of the folder, the operation of the file, And the operation of the columns of the data are implemented by querying the Metastore.
The files stored in HDFs, the data types are byte arrays, are inherently non-differentiated, but in a HQL statement you need to differentiate between types and the corresponding types.

Hive Installation:

Hive is the equivalent of a client, Hadoop is the service side, so our hive is going to be configured on a single machine in Hadoop.

    cd/usr/local    -zxvf hive-0.9.0. tar.gz    Rename: MV Hive-0.9.0. tar.gz Hive    into the hive configuration directory: CD Hive/conf    mv Hive-exec-log4j.properties.templater hive-exec-log4j.properties    mv Hive -log4j.properties.templater hive-log4j.properties    mv Hive-env.sh.template hive-env.sh    mv Hive-default. xml.template hive-default. XML    CP Hive-  Default. xml hive-site.xml

Delete all content inside Hive-site
Modify the hive-config.sh in the Bin directory to add the following three lines

    Export java_home=/usr/local/JDK export    hive_home=/usr/local/HIVE    export hadoop_home=/ Usr/local/hadoop
Use of Hive:
    CD bin/     performing hive    display database: show databases;     default ;    Show tables in the database: show tables;     int );     * FROM T1;

Mapping Relationships for Hive:

A database in hive that corresponds to a folder in HDFs.
The columns in the Hive table correspond to the different fields of data in the data files in the folders in HDFs.
The data in the Hive table corresponds to the data in the data file in the folder in HDFs.
When the hive deployment succeeds, the hive default database defaults to the corresponding directory on HDFs is/usr/hive/warehouse

Modify Hive's metastore to MySQL

installing MySQL on CentOS

    Yum install mysql-server    starts MySQL service mysqld start    &    -uroot    into MySQL database Use MySQL;    Query user's password select password, user from user;    Modify the user password to update the username set password=password (' admin ');     * * to ' root ' @ '% ' identified by ' admin ';    Flush flush privileges; The previous operation takes effect    immediately in the current reply-uroot-padmin

Put the MySQL JDBC driver into the Lib directory of Hive, 

CP Mysql-connector-java-5.1.10/usr/local/hive/lib

Modify Hive/conf/hive-site.xml

    <property>        <name>javax.jdo.option.ConnectionURL</name>        <value>jdbc:mysql:/ /hadoop4:3306/hive_hadoop?createdatabaseifnotexist=true</value>    </property>    <property>        <name>javax.jdo.option.ConnectionDriverName</name>        <value> com.mysql.jdbc.driver</value>    </property>    <property>        <name> javax.jdo.option.connectionusername</name>        <value>root</value>    </property>    <property>        <name>javax.jdo.option.ConnectionPassword</name>        <value> Admin</value>    </property>

Remove the metastore_db,derby.log below hive/bin/

Operation:

    hive    default;    Show tables;     int );    Quit;     -uroot-padmin    show databases;    Use Hive_hadoop;    SHWO tables;

Database on the Dbs:hive

TBLs: Table Information
COLUMNS_V2: Column Information

    SELECT * from dbs;     * from tbls \g;     * FROM COLUMNS_V2;

Load the Linux disk file into the Hive table: the operation on Hive is actually an operation on HDFs, and the operation on HDFs allows only one write to be allowed, and the load data is from the disk file.

    VI onecolumn    1    2    3    4    5
VI onecolumn Data

Configure hive into an environment variable:

    VI ~/. BASHRC    export PATH=/usr/local/hive/bin: $PATH    ~/. BASHRC    hive    '. Onecolumn ' into table T1;     * from T1; query    drop table t1, delete tables not modified in hive
add hive to an environment variable

Uploading multiple columns of data:

    int, name string) row format delimited fields terminated by ' \ t ';    VI User    1    zhangsan    2    Lisi    3    Wangwu    './user ' into table ' T2 ';     * from T2;    Select name from T2;

When you need to query a column, hive is not able to scan the whole table, only to go to the MapReduce

To create a table:

    int );     int, name string) row format delimited fields terminated by ' \ t ';

Load the Linux disk file into the hive table:

    Load data local inpath './onecolumn ' into table t1;    '. /user ' into table T2;

Hive architecture, installation of hive and installation of MySQL, and some simple use of hive

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.