Hive Server 2 Research, installation and deployment

Source: Internet
Author: User
Tags ldap stmt ticket server memory

background

We have been using Hive server 1 for a long time, and users Ad-hoc Query,hive-web, wormhole, operations tools, and so on, are submitting statements through Hive Server. But hive server is extremely unstable, often inexplicable mysterious death, causing the client side of all connection are blocked. To this we have to configure a crontab check script, will continue to execute "show tables" statement to detect whether the server suspended animation, if suspended animation, can only kill daemon process restart. In addition hive Server 1 concurrency support is not good, if a user set some environment variables in the connection, bound to a thrift worker thread, the user disconnected, another user also created a connection, He may also be assigned to the previous worker thread and will reuse the previous configuration. This is because thrift does not support detecting whether a client is disconnected, and it cannot purge session state information. It is difficult to do ha in the same way that session is bound to worker thread. Hive server 2 has the perfect support for the session, the client side each time RPC call will bring a SessionID, the Server side will be mapping to save the state of the message of session states, so that any worker Thread can execute different statements of the same session without tying them to the same one.

Hive 0.11 contains Hive server 1 and Hive server 2, and also includes 1 for backward compatibility. In the long run, Hive Server 2 will be the first choice.


Configuration

1. Configure the Hive server listening port and host

<property>
  <name>hive.server2.thrift.port</name>
  <value>10000</value>
</property>
<property>
  <name>hive.server2.thrift.bind.host</name>
  <value>test84.hadoop</value>
</property>

2. Configure Kerberos authentication so that thrift client and Hive server 2, Hive server 2 and HDFs interaction are authenticated by Kerberos
<property>
  <name>hive.server2.authentication</name>
  <value>kerberos</value >
  <description>
    Client authentication types.
       None:no authentication Check
       ldap:ldap/ad based authentication
       KERBEROS:KERBEROS/GSSAPI authentication Custom:custom authentication Provider (use with property
               Hive.server2.custom.authentication.class)
  </ description>
</property>
<property>
  <name> Hive.server2.authentication.kerberos.principal</name>
  <value>hadoop/_host@dianping.com</ value>
</property>
<property>
  <name> Hive.server2.authentication.kerberos.keytab</name>
  <value>/etc/hadoop.keytab</value>
</property>

3. Set impersonation so that Hive server executes the statement as the submitting user, and if set to false, the statement is executed with Admin user from Hive server daemon
<property>
  <name>hive.server2.enable.doAs</name>
  <value>true</value>
</property>

Execute command $hive_home/bin/hive--service hiveserver2 or $HIVE _home/bin/hiveserver2 Will invoke the Org.apache.hive.service.server.HiveServer2 Main method to start the
The output log information in hive log is as follows:
2013-09-17 14:59:21,081 INFO Server. HiveServer2 (HiveStringUtils.java:startupShutdownMessage (604))-startup_msg:/***********************************   startup_msg:starting HiveServer2 startup_msg:host = test84.hadoop/10.1.77.84 startup_msg:
args = [] Startup_msg:version = 0.11.0 Startup_msg:classpath = slightly ......... 2013-09-17 14:59:21,957 INFO Security. Usergroupinformation (UserGroupInformation.java:loginUserFromKeytab (633))-Login successful for user hadoop/ Test84.hadoop@DIANPING.COM using keytab file/etc/hadoop.keytab 2013-09-17 14:59:21,958 INFO service.
Abstractservice (AbstractService.java:init)-Service:operationmanager is inited. 2013-09-17 14:59:21,958 INFO Service.
Abstractservice (AbstractService.java:init)-Service:sessionmanager is inited. 2013-09-17 14:59:21,958 INFO Service.
Abstractservice (AbstractService.java:init)-Service:cliservice is inited. 2013-09-17 14:59:21,959 INFO Service. Abstractservice (AbstracTService.java:init)-Service:thriftcliservice is inited. 2013-09-17 14:59:21,959 INFO Service.
Abstractservice (AbstractService.java:init)-Service:hiveserver2 is inited. 2013-09-17 14:59:21,959 INFO Service.
Abstractservice (AbstractService.java:start)-Service:operationmanager is started. 2013-09-17 14:59:21,960 INFO Service.
Abstractservice (AbstractService.java:start)-Service:sessionmanager is started. 2013-09-17 14:59:21,960 INFO Service.
Abstractservice (AbstractService.java:start)-Service:cliservice is started. 2013-09-17 14:59:22,007 INFO Metastore. Hivemetastore (HiveMetaStore.java:newRawStore (409))-0:opening Raw store with implemenation Class:o Rg.apache.hadoop.hive.metastore.ObjectStore 2013-09-17 14:59:22,032 INFO metastore. ObjectStore (ObjectStore.java:initialize (222))-ObjectStore, initialize called, 2013-09-17 14:59:22,955 INFO metastore. ObjectStore (ObjectStore.java:getPMF (267))-Setting Metastore object pin classes with Hive.metastore. cache.pinobjtypes= "Table,storagedescriptor,serdeinfo,partition,database,type,fieldschema,order" 2013-09-17 14:59:23,000 INFO Metastore. ObjectStore (ObjectStore.java:setConf (205))-initialized ObjectStore 2013-09-17 14:59:23,909 INFO metastore.  Hivemetastore (HiveMetaStore.java:logInfo (452))-0:get_databases:default 2013-09-17 14:59:23,912 INFO Hivemetastore.audit (HiveMetaStore.java:logAuditEvent (238))-ugi=hadoop/test84.hadoop@dianping.com ip= Unknown-ip-addr cmd=get_databases:default 2013-09-17 14:59:23,933 INFO service.
Abstractservice (AbstractService.java:start)-Service:thriftcliservice is started. 2013-09-17 14:59:23,948 INFO Service.
Abstractservice (AbstractService.java:start)-Service:hiveserver2 is started. 2013-09-17 14:59:24,025 INFO Security. Usergroupinformation (UserGroupInformation.java:loginUserFromKeytab (633))-Login successful for user hadoop/ Test84.hadoop@DIANPING.COM using keytab file/etc/hadoop.keytab 2013-09-17 14:59:24,047 INFO Thrift. ThRiftcliservice (ThriftCLIService.java:run (435))-Thriftcliservice listening on test84.hadoop/10.1.77.84:10000 
You can see that in HiveServer2 has become a compisite service, it contains a set of service, including Operationmanager,sessionmanager,cliservice, Thriftcliservice. And the Hivemetastore connection is established at initialization time and the Get_databases command is invoked to test. Finally, the thrift server (actually a tthreadpool) is started, listening on the test84.hadoop/10.1.77.84:10000 port
In addition, the Hadoop filesystem cache caches file system objects with a URI schema, authority, UGI (CurrentUser), and a unique combination as key. However, this results in Hive server memory leak, which can be seen by the "Jmap-histo pid" as a very large number of filesystem objects, so you need to add hive file system when you start disable server The cache parameter.
$HIVE _home/bin/hive--service hiveserver2--hiveconf fs.hdfs.impl.disable.cache=true--hiveconf Fs.file.impl.disable.cache=true

1. Beeline access to Hive server 2 Beeline is the new interactive CLI introduced in Hive 0.11, which is based on sqlline and can be accessed as hive JDBC client Side Hive Server 2, and starting a beeline is to maintain a session. Because Kerberos authentication is used, there is a need to have Kerberos ticket locally and to specify service principal for Hive server 2 in the connection URL, principal=hadoop/ Test84.hadoop@DIANPING.COM, another user name and password can not be filled out, after the statement will be in the current ticket cache principal user identity to execute.
-dpsh-3.2$ bin/beeline beeline version 0.11.0 by Apache Hive beeline>!connect jdbc:hive2://test84 . hadoop:10000/default;principal=hadoop/test84.hadoop@dianping.com scan Complete in 2ms connecting to jdbc:hive2:// Test84.hadoop:10000/default;principal=hadoop/test84.hadoop@dianping.com Enter username for jdbc:hive2:// Test84.hadoop:10000/default;principal=hadoop/test84.hadoop@dianping.com:enter Password for jdbc:hive2:// test84.hadoop:10000/default;principal=hadoop/test84.hadoop@dianping.com:connected to:hive (version 0.11.0) Driver: Hive (version 0.11.0) Transaction Isolation:transaction_repeatable_read 0:jdbc:hive2://test84.hadoop:10000/default
> select COUNT (1) from ABC; +------+
|
_c0 | +------+
|
0 | +------+ 1 row selected (29.277 seconds) 0:jdbc:hive2://test84.hadoop:10000/default>!q Closing: Org.apache.hive.jdbc.HiveConnection 
Thrift client and server create a session handler with a unique Handleidentifier (SessionID), which is managed by Cliservice in SessionManager ( Maintains the mapping relationship between SessionHandle and Hivesession, Hivesession maintains sessionconf and hiveconf information, and a new driver is created for each execution of the user's statement. The way in which Hive server 2 supports concurrency is when the hiveconf is passed in and the statement is executed. Each operation (there will be different optype, such as Execute_statemen) will generate an independent operationhandle, also have their own handleidentifier. The user entering "!q" in Beeline destroys the session and destroys the appropriate resource.
PS: The use of a little bit less cool is the implementation of MapReduce job when the process information is not executed, if it is a lengthy execution of statements, will wait for a long time without any feedback.
2. JDBC ModeThe driver classname of Hive server 1 is org.apache.hadoop.hive.jdbc.hivedriver,hive server 2 is Org.apache.hive.jdbc.HiveDriver, the two are easy to confuse. Alternatively, hiveconf param and variables can be specified in the Connectionurl, and the params between the '; ' Split, params and variables are separated by ' # '. These are session-level, and hive first executes the SET Hiveconf key value statement after it has been established. For example: 1. With hiveconf and variables:jdbc:hive2://test84.hadoop:10000/default?hive.cli.conf.printheader=true#stab= Salestable;icol=customerid
2. With Variables:jdbc:hive2://test84.hadoop:10000/default;user=foo;password=bar
Sample code:
Import java.sql.Connection;
Import Java.sql.DriverManager;
Import Java.sql.ResultSet;
Import Java.sql.ResultSetMetaData;
Import java.sql.SQLException;

Import java.sql.Statement; public class Hivetest {public static void main (string[] args) throws SQLException {try {class.forname ("Org.apach
		E.hive.jdbc.hivedriver ");
		catch (ClassNotFoundException e) {e.printstacktrace (); } Connection conn = DriverManager. getconnection ("jdbc:hive2://test84.hadoop:10000/default;principal=hadoop/t
		Est84.hadoop@DIANPING.COM "," "," "");
		Statement stmt = Conn.createstatement ();
		String sql = "SELECT * FROM ABC";
		System.out.println ("Running:" + sql);
		ResultSet res = stmt.executequery (SQL);
		ResultSetMetaData RSMD = Res.getmetadata ();
		int columnCount = Rsmd.getcolumncount (); for (int i = 1; I <= columnCount i++) {System.out.println (Rsmd.getcolumntypename (i) + ":" + Rsmd.getcolumnnam
		E (i)); while (Res.next ()) {System.out.println (String.valuEOf (Res.getint (1)) + "T" + res.getstring (2)); }
	}
}
Hivestatement now supports canceling statements, and calling Statement.cancel () terminates and destroys the driver in progress

Note: If Kerberos authentication is problematic, the JVM option "-dsun.security.krb5.debug=true" can be added to the client JVM to see more details

Reference Link: Https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API https://cwiki.apache.org/ Confluence/display/hive/setting+up+hiveserver2 https://github.com/apache/hive/blob/trunk/service/if/ Tcliservice.thrift http://blog.cloudera.com/blog/2013/07/ how-hiveserver2-brings-security-and-concurrency-to-apache-hive/

This article link http://blog.csdn.net/lalaguozhe/article/details/11776055, reprint please specify

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.