Today I am mainly in this hive, read the book in the morning, the beginning a bit messy, the back slowly found that hive is actually quite simple, to my understanding is and database related things, then this is easier for me, because I am more familiar with SQL syntax, and this is HQL, In fact, many are similar. Let's take a look at Hive's basic introduction:
First, Hive Fundamentals
Hive is a Hadoop-based data warehousing tool that maps structured data files into a single database table and provides simple SQL query functionality that translates SQL statements into MapReduce tasks. The advantage is that the learning cost is low, the simple mapreduce statistics can be quickly realized through the class SQL statements, and it is very suitable for the statistical analysis of data Warehouse without developing specialized mapreduce applications.
Hive stores metadata in a database (RDBMS), such as MySQL, Derby. Hive has three modes to connect to data, in the form of single-user mode, multi-user mode, and remote service mode. (i.e. inline mode
, local mode, remote mode).
1.1 Hive Architecture:
Hive Architecture diagram: Mainly divided into: User interface, thrift server, metadata storage, parser, Hadoop
1.2 Hive Data type
Hive's storage is built on the Hadoop file system, which itself does not have a dedicated data storage format, which consists mainly of four types of data models:
Tables (table)
Partitioning (Partition)
Barrels (buckets)
External table (External table)
The built-in data types of hive can be divided into two main categories: (1), underlying data type, (2), and complex data types. Among them, the underlying data types are: Tinyint,smallint,int,bigint,boolean,float,double,string,binary,timestamp,decimal,char,varchar,date.
Key to the implementation process of 1.3Hive
Operator (Operator) is the minimum processing unit for hive;
Each operator process represents HDFS operations or Mr Jobs;
The compiler transforms hive SQL into a set of operators;
Hive performs mapreduce tasks through Execmapper and execreducer;
There are two modes of executing MapReduce: Local mode and distributed mode;
The Common Hive Operators (section) are as follows:
1.4 Hive's HQL operation
Hive basically runs the same operation as SQL, for example:
Select U.name, o.orderid from order o join user u on o.uid = U.uid;select dealid, COUNT (distinct uid), COUNT (distinct date ) from the order group by Dealid;
Simple Hive Table statement:
CREATE TABLE Student
(
Name String,
Sex String,
Age int
);
Second, Hive basic configuration
1, from the Apache website of Hadoop find Hive, the latest version is 2.0.1, I am under this, http://hive.apache.org/downloads.html,
2, download the MySQL driver, is currently 5.1.38, I have put this need two to tidy up a compressed package, you can download by the following link: (I post-posted)
3, respectively extracted to the directory you need, I was placed in the/home/admin1/download/hive-2.0.1, the MySQL driver is also placed in the hive of this LIB package, and then in hive-2.0.1/conf to the following files are configured:
Create a new file hive-env.sh
Change the contents of the directory to your Hadoop placement directory.
Export hive_home=/home/admin1/Download/hive-2.0.1export path= $PATH: $HIVE _home/binhadoop_home=/home/admin1/Download/ Hadoop-2.5.2export hive_conf_dir=/home/admin1/Download/hive-2.0.1/confexport hive_aux_jars_path=/home/admin1/download/ Hive-2.0.1/lib
You also need to create a new hive-site.xml:
Here I use the MySQL account and password to configure, the other you can also refer to the configuration.
<?xml version= "1.0" encoding= "UTF-8" standalone= "no"? ><?xml-stylesheet type= "text/xsl" href= " Configuration.xsl "? ><configuration><property><name>javax.jdo.option.connectionurl</ name><!--<value>jdbc:derby:;d atabasename=metastore_db;create=true</value>--><value> JDBC:MYSQL://LOCALHOST:3306/HIVE?=CREATEDATABASEIFNOTEXIST=TRUE</VALUE><DESCRIPTION>JDBC Connect String for a jdbcmetastore</description></property> <property><name> javax.jdo.option.connectiondrivername</name><!--<value>org.apache.derby.jdbc.EmbeddedDriver< /value>--><value>com.mysql.jdbc.driver</value><description>driver class name for a Jdbcmetastore</description></property> <property><name> Javax.jdo.option.connectionusername</name><value>hive</value><description>username to Use against Metastoredatabase</description></property><property><name>javax.jdo.option.connectionpassword</name><value>a</value>< Description>password to use against metastoredatabase</description></property></configuration >
Start: Execute in/home/admin1/download/hive-2.0.1:
Bin/hive
If it cannot initialize, then:
Bin/schematool-dbtype Mysql-initschema
In hive2.0 above version all need Initschema, otherwise will error, I also because this problem toss for several hours, finally found very simple to solve.
The last thing to say is that when installing MySQL, you can directly use the UK software in Ubuntu to download it, search for MySQL inside, and then download the MySQL server, the client and the work platform can be, here is no longer repeated, it is necessary to create a new user in the console:
Mysql-uroot
Create user ' hive ' identify by ' hive ';
Create DATABASE hive;
Grant all privileges on * * to ' hive '@ ' localhost ' identified by ' hive ';
Flush Privileges
You can then log in via the hive account,
Mysql-u hive-p
Then enter the password hive to successfully log in, the login information to configure the Hive-site.xml can be.
The next step is to happily use the hive, create tables, and so on. Remember to open the Hadoop service, sbin/start-all.sh.
Summary: Today, the two major problems encountered is that after 1:bin/hive has been an error, after the final initialization can be. 2, Sublim-text in Linux can not input Chinese, and cannot download GPK resolution, unable to compile sublime_imfix.c, later through the discovery of the compiled library on GitHub, and then import, after a series of complex operations finally successfully solved the problem. Find the right method and find the right tool.
Hive Fundamentals and Environment building