What is hive?
Hive is a data warehouse infrastructure built on Hadoop. It provides a series of tools that can be used to extract and transform the data (ETL), a mechanism that can store, query, and analyze large-scale data stored in Hadoop. Hive defines a simple class SQL query language called HQL, which allows users who are familiar with SQL to query data. At the same time, the language also allows developers to familiarize themselves with the development of custom Mapper and reducer for the built-in mapper and reducer of complex analytical work that cannot be done.
Hive is part of the Data warehouse in the Hadoop ecosystem. He is able to manage the data in the **hadoop and can query the data in the **hadoop.
Advantages and Disadvantages
Low cost, start faster.
Simple MapReduce statistics can be quickly implemented with class-SQL statements without the need to develop specialized mapreuduce applications.
Real-time queries are not supported.
Hive System Architecture
Metadata storage: typically stored in a relational database, such as MySQL, Derby. The metadata in Hive includes the name of the table, the columns and partitions of the table and its properties, the properties of the table (whether it is an external table, etc.), the directory where the table's data resides, and so on.
driver: interpreter, compiler, optimizer, actuator
Query Compiler:
Execution Engine:
Server:
Client components:
Extensible Interface Section:
hive Meta data store
Derby (Built-in Derby, default)
Single session
Create a metadata file on the startup Terminal Day record
cannot be shared by multiple users
MySQL
Install MySQL, configure accounts, permissions
Mysql-connector-java-5.1.22-bin.jar Copy to the hive installation directory under the Lib directory
Modify Hive-site.xml
Hive Client access mode
1. CLI command line
[root@hadoop1 ~]# hive
2, Hwi
[root@hadoop1 ~]# hive --service hwihttp://localhost:9999/hwi
3, Hiveserver
Start Hiveserver[[email protected] ~]# hive--service hiveserverIf Org.apache.thrift.transport.TTransportException:Could appears notCreate ServerSocket onAddress0.0. 0. 0/0.0. 0. 0:10000.WORKAROUND: Port is occupied, kill the port process or re-establish port hive--service hiveserver-p10001Accessed through the Hive-jdbc method.Private Static StringHivedriver="Org.apache.hadoop.hive.jdbc.HiveDriver";Private Static StringUrl="Jdbc:hive://hadoop1:10001/default";Private Static StringName="";Private Static Stringpassword="";Class. forname (Hivedriver); Connection conn = drivermanager.getconnection (Url,name,password); Statement stat=conn.createstatement ();StringSql="Show Tables"; ResultSet rs = stat.executequery (SQL);
Demo:
PackageExampleImportJava.sql.Connection;ImportJava.sql.DriverManager;ImportJava.sql.ResultSet;ImportJava.sql.SQLException;ImportJava.sql.Statement; Public class hivejdbc { Private StaticString hivedriver="Org.apache.hadoop.hive.jdbc.HiveDriver";Private StaticString url="Jdbc:hive://hadoop1:10001/default";Private StaticString name="";Private StaticString password=""; Public Static void Main(string[] args) {Try{Class.forName (hivedriver); Connection conn = drivermanager.getconnection (Url,name,password); Statement stat=conn.createstatement (); String sql="Show Tables"; String sqlString ="SELECT * from addressall_2015_07_09"; ResultSet rs = stat.executequery (sqlString); while(Rs.next ()) {//hive is starting from 1. //system.out.println (rs.getstring (1)); System.out.println (Rs.getstring (1)+" "+rs.getint (2)+" "+rs.getint (3)+" "+rs.getint (4)); } }Catch(ClassNotFoundException e) {E.printstacktrace (); }Catch(SQLException e) {E.printstacktrace (); } }}
Operation Result:
2015_07_09 536 488 493
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Hive brief and several access methods