Configure HiveServer2 and 0.14.0hiveserver2 Based on version 0.14.0

Last Update:2015-05-04 Source: Internet

Author: User

Tags ldap metabase kinit

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Configure HiveServer2 and 0.14.0hiveserver2 Based on version 0.14.0
In the project, you need to access hive as a mondrian heterogeneous data source to execute MDX queries. Generally, when using hive, I directly execute SQL statements using hive command lines, or access the hadoop cluster through the hive jar package in the program. In this way, the accessed hadoop cluster is a company cluster, during the previous hive test, I remember that I modified the hive jdbc source code. The main difference was that I modified some APIs that were not implemented but thrown exceptions when implementing jdbc, mondrian calls these interfaces, causing the following process to fail. The overall modification should be relatively simple. Another problem is that at that time, hive did not use any authentication mechanism, including hadoop. Currently, kerberos authentication is required for running on the company's hadoop cluster, I am not familiar with this part, but I still know how to use it, so I need to provide some knowledge about kerberos authentication. The following describes my understanding of several hive usage methods. First, the hive metabase is divided into three forms. The first is the embedded derby database, in this way, derby creates a directory in the current directory, so it intelligently starts a hive instance. The second method is to use a remote database, that is, to use a relational database system, for example, mysql (currently only mysql is tested). hive connects to mysql through jdbc to obtain metadata information. Another method is the built-in metaserver of hive, which is used to link metadata, it is equivalent to setting up another service before the Real Metadata Manager. There are two main ways to use hive in the Process of use. The first is to use hive as a basis for SQL queries on files, that is, to directly use hive command lines, or use the functions provided by hive to start a program. In this case, you only need to configure the hive metadata server (to tell hive which databases and tables are stored and their attributes) and hive Data Warehouse directory (usually an HDFS directory). The directory of the tested data warehouse only works when creating the database, when creating a table, the Directory of the table will be created under the directory of the database where the table is located. In addition, you must specify the hadoop configuration file and jar package. After all, hive depends on hadoop to execute tasks. The second method is to talk about using hive as a database that provides SQL interfaces. We can access it through jdbc, similar to using mysql, this article describes how to configure the server, use the client provided by hive, and connect to and use jdbc.
The next step is to configure the hive environment. For hive, I generally use remote mysql as the source data server. Without using the matestore server that comes with hive, it seems that the latter can support more concurrency. This is simple if there is no need for it now. Besides the metabase, there is also a focus on the data warehouse address, my personal user intern directory/user/intern is configured as follows:

<property>  <name>hive.metastore.warehouse.dir</name>  <value>/user/intern/</value>  <description>location of default database for the warehouse</description></property><property>  <name>javax.jdo.option.ConnectionURL</name>  <value>jdbc:mysql://127.0.0.1:3306/HIVE</value>  <description>JDBC connect string for a JDBC metastore</description></property><property>  <name>javax.jdo.option.ConnectionDriverName</name>  <value>com.mysql.jdbc.Driver</value>  <description>Driver class name for a JDBC metastore</description></property><property>  <name>javax.jdo.option.ConnectionUserName</name>  <value>root</value>  <description>username to use against metastore database</description></property><property>  <name>javax.jdo.option.ConnectionPassword</name>  <value>root</value>  <description>password to use against metastore database</description></property>

In addition, you need to pay attention to the creation of hive metadata. In general, we will use utf8 as the default Character Set of the database (to support Chinese characters ), however, if you use the utf8 Character Set hive, there will be a lot of inexplicable errors that people cannot understand. Therefore, when creating a hive database, you need to specify the character set as latin1, in addition, hive can be automatically created for you (I have not tried it and do not know whether it is feasible ).
Next, you need to configure something about kerberos authentication. The specific configuration content is as follows:

<property>  <name>hive.server2.authentication</name>  <value>KERBEROS</value>  <description>    Client authentication types.       NONE: no authentication check       LDAP: LDAP/AD based authentication       KERBEROS: Kerberos/GSSAPI authentication       CUSTOM: Custom authentication provider               (Use with property hive.server2.custom.authentication.class)       PAM: Pluggable authentication module.  </description></property><property>  <name>hive.server2.authentication.kerberos.principal</name>  <value>hive/xxx@HADOOP.XXX.COM</value>  <description>    Kerberos server principal  </description></property><property>  <name>hive.server2.authentication.kerberos.keytab</name>  <value>/home/hzfengyu/hive.keytab</value>  <description>    Kerberos keytab file for server principal  </description></property>

The three configuration items are the authentication method for configuring hiveserver2. If the configuration is not required, many problems may occur on the client. The authentication method is CUSTOM by default. Here we configure it to KERBEROS, then configure the keytab file and principal required for kerberos authentication. In general, we need to execute kinit, but the difference is that the principal must be specified here, it's not just the thing before the @ symbol (kinit only specifies the previous thing ), in addition, it should be noted that the user corresponding to the keytab must have the permission for proxy execution on hadoop, which is required by hiveserver2. That is to say, hiveserver2 is only a server with the specified proxy, different users connect to hiveserver2 through jdbc, and perform specific operations based on different keytab users on different clients. If the user does not have the proxy permission, an authentication error will occur when using jdbc and hiveserver2 to establish a connection. The error stack is:

15/05/01 17:32:33 [main]: ERROR transport.TSaslTransport: SASL negotiation failurejavax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - UNKNOWN_SERVER)]     at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)     at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)     at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)     at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)     at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)     at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)     at java.security.AccessController.doPrivileged(Native Method)

Here I am using a hive user who has the proxy permission and the user who uses the jdbc connection machine is intern. First, run the command to start hiveserver2 on the hive machine:

./bin/hive --service hiveserver2

Then connect to the client through the beeline provided by hive:

./bin/beeline

Then use the connect command to connect to hiveserver2:

beeline> !connect jdbc:hive2://hiveserver2-ip:10000/foodmart;principal=hive/xxx@HADOOP.XXX.COM;scan complete in 34msConnecting to jdbc:hive2://bitest0.server.163.org:10000/foodmart;principal=hive/app-20.photo.163.org@HADOOP.HZ.NETEASE.COM;Enter username for jdbc:hive2://bitest0.server.163.org:10000/foodmart;principal=hive/app-20.photo.163.org@HADOOP.HZ.NETEASE.COM;:Enter password for jdbc:hive2://bitest0.server.163.org:10000/foodmart;principal=hive/app-20.photo.163.org@HADOOP.HZ.NETEASE.COM;:Connected to: Apache Hive (version 0.14.0)Driver: Hive JDBC (version 0.14.0)Transaction isolation: TRANSACTION_REPEATABLE_READ0: jdbc:hive2://bitest0.server.163.org:10000/>

During connection, you must specify the jdbc url (the default port number is 10000, which can also be configured in the configuration file of hiveserver2). In addition, you must specify the server's principal, that is, the hive configured above. server2.authentication. kerberos. principal, and the client user is the current user of the client, you can use klist to view.
In addition to using the built-in beeline connection, you can also use jdbc in the program to connect. The test code is as follows:

import java.io.IOException;import java.sql.Connection;import java.sql.DriverManager;import java.sql.ResultSet;import java.sql.SQLException;import java.sql.Statement;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.security.UserGroupInformation;public class TestHive {  public static void main(String[] args) throws SQLException {    try {      Class.forName("org.apache.hive.jdbc.HiveDriver");    } catch (ClassNotFoundException e) {      e.printStackTrace();    }    Configuration conf = new Configuration();conf.setBoolean("hadoop.security.authorization", true);conf.set("hadoop.security.authentication", "kerberos");UserGroupInformation.setConfiguration(conf);try {UserGroupInformation.loginUserFromKeytab("intern/bigdata", "C:\\Users\\Administrator\\Desktop\\intern.keytab");} catch (IOException e) {// TODO Auto-generated catch blocke.printStackTrace();}    Connection conn = DriverManager        .getConnection(            "jdbc:hive2://hiveserver2-ip:10000/foodmart;principal=hive/xxx@HADOOP.XXX.COM;User=;Password=;",            "", "");    Statement stmt = conn.createStatement();    String sql = "select * from account limit 10";    System.out.println("Running: " + sql);    ResultSet res = stmt.executeQuery(sql);    while (res.next()) {      System.out.println(String.valueOf(res.getInt(1)) + "\t"          + res.getString(2));    }  }}

Now, we have finished setting up hiveserver2 with kerberos authentication. The next article describes how to use hive as the data source of mondrian to execute MDX queries.
Finally, I will introduce the biggest problem I encountered. When configuring kerberos authentication, the hive version I used is 0.13.1. According to the above configuration, the following problems occur:

2015-04-30 17:02:22,602 ERROR [Thread-6]: thrift.ThriftCLIService (ThriftBinaryCLIService.java:run(93)) - Error: java.lang.NoSuchFieldError: SASL_PROPS        at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S.getHadoopSaslProperties(HadoopThriftAuthBridge20S.java:126)        at org.apache.hive.service.auth.HiveAuthFactory.getSaslProperties(HiveAuthFactory.java:116)        at org.apache.hive.service.auth.HiveAuthFactory.getAuthTransFactory(HiveAuthFactory.java:133)        at org.apache.hive.service.cli.thrift.ThriftBinaryCLIService.run(ThriftBinaryCLIService.java:43)        at java.lang.Thread.run(Thread.java:701)2015-04-30 17:02:22,605 INFO  [Thread[Thread-7,5,main]]: delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:updateCurrentKey(222)) - Updating the current master key for generating delegation tokens2015-04-30 17:02:22,612 INFO  [Thread-3]: server.HiveServer2 (HiveStringUtils.java:run(623)) - SHUTDOWN_MSG: /************************************************************SHUTDOWN_MSG: Shutting down HiveServer2 at bitest0.server.163.org/10.120.36.85************************************************************/

Finally, I found a related error on google and found the HIVE bug: failed (which can be said to be helpless). Then I changed the hive version, as mentioned above, after 0.14.0 has been solved, I changed to a new version. As a result, this problem no longer occurs. It is unknown whether the bug has caused the problem.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More