Hive brief and several access methods

Source: Internet
Author: User

What is hive?

Hive is a data warehouse infrastructure built on Hadoop. It provides a series of tools that can be used to extract and transform the data (ETL), a mechanism that can store, query, and analyze large-scale data stored in Hadoop. Hive defines a simple class SQL query language called HQL, which allows users who are familiar with SQL to query data. At the same time, the language also allows developers to familiarize themselves with the development of custom Mapper and reducer for the built-in mapper and reducer of complex analytical work that cannot be done.

Hive is part of the Data warehouse in the Hadoop ecosystem. He is able to manage the data in the **hadoop and can query the data in the **hadoop.

Advantages and Disadvantages

Low cost, start faster.
Simple MapReduce statistics can be quickly implemented with class-SQL statements without the need to develop specialized mapreuduce applications.
Real-time queries are not supported.

Hive System Architecture


Metadata storage: typically stored in a relational database, such as MySQL, Derby. The metadata in Hive includes the name of the table, the columns and partitions of the table and its properties, the properties of the table (whether it is an external table, etc.), the directory where the table's data resides, and so on.
driver: interpreter, compiler, optimizer, actuator
Query Compiler:
Execution Engine:
Server:
Client components:
Extensible Interface Section:

hive Meta data store

Derby (Built-in Derby, default)
Single session
Create a metadata file on the startup Terminal Day record
cannot be shared by multiple users

MySQL
Install MySQL, configure accounts, permissions
Mysql-connector-java-5.1.22-bin.jar Copy to the hive installation directory under the Lib directory
Modify Hive-site.xml

Hive Client access mode

1. CLI command line

[root@hadoop1 ~]# hive

2, Hwi

[root@hadoop1 ~]# hive --service hwihttp://localhost:9999/hwi

3, Hiveserver

Start Hiveserver[[email protected] ~]# hive--service hiveserverIf Org.apache.thrift.transport.TTransportException:Could appears notCreate ServerSocket onAddress0.0. 0. 0/0.0. 0. 0:10000.WORKAROUND: Port is occupied, kill the port process or re-establish port hive--service hiveserver-p10001Accessed through the Hive-jdbc method.Private Static StringHivedriver="Org.apache.hadoop.hive.jdbc.HiveDriver";Private Static StringUrl="Jdbc:hive://hadoop1:10001/default";Private Static StringName="";Private Static Stringpassword="";Class. forname (Hivedriver); Connection conn = drivermanager.getconnection (Url,name,password); Statement stat=conn.createstatement ();StringSql="Show Tables"; ResultSet rs = stat.executequery (SQL);

Demo:

 PackageExampleImportJava.sql.Connection;ImportJava.sql.DriverManager;ImportJava.sql.ResultSet;ImportJava.sql.SQLException;ImportJava.sql.Statement; Public  class hivejdbc {    Private StaticString hivedriver="Org.apache.hadoop.hive.jdbc.HiveDriver";Private StaticString url="Jdbc:hive://hadoop1:10001/default";Private StaticString name="";Private StaticString password=""; Public Static void Main(string[] args) {Try{Class.forName (hivedriver);            Connection conn = drivermanager.getconnection (Url,name,password);            Statement stat=conn.createstatement (); String sql="Show Tables"; String sqlString ="SELECT * from addressall_2015_07_09"; ResultSet rs = stat.executequery (sqlString); while(Rs.next ()) {//hive is starting from 1.                //system.out.println (rs.getstring (1)); System.out.println (Rs.getstring (1)+" "+rs.getint (2)+" "+rs.getint (3)+" "+rs.getint (4)); }        }Catch(ClassNotFoundException e)        {E.printstacktrace (); }Catch(SQLException e)        {E.printstacktrace (); }       }}

Operation Result:

2015_07_09 536 488 493

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Hive brief and several access methods

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.