Import hive statistical analysis results into a MySQL database table (iii)--using hive UDF or genericudf

Last Update:2015-04-24 Source: Internet

Author: User

Tags log log

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I've described two ways in which hive imports analysis results into MySQL tables, respectively: Sqoop import and Using hive, MySQL JDBC driver, now I'm going to introduce a third, and use a lot more ways--using hive custom Functions ( UDF or genericudf) inserts each record into a database table.

First, using the UDF method

Using UDF is a simple implementation, as long as you inherit the UDF class and override the Evaluate method to

1. Write implementation class

Package Com.gxnzx.hive.udf;import Org.apache.hadoop.hive.ql.exec.udf;import Com.gxnzx.hive.util.DBSqlHelper; public class Analyzestatistics  extends udf{public        string Evaluate (string clxxbh,string hphm) {                // JTXX2 database is the target database table                String sql= "insert into jtxx2 values (?,?)";                Insert records into the database                if (dbsqlhelper.addbatch (SQL, CLXXBH, HPHM)) {                        return clxxbh+ "  SUCCESS  " +HPHM;                } else{                        return clxxbh+ "  faile  " +HPHM;                }        }

2, database operation method

          public static Boolean Addbatch (String sql,string clxxbh,string hphm) {                  Boolean flag=false;                  try{                          conn=dbsqlhelper.getconn ();//Open a database connection                          ps= (preparedstatement) conn.preparestatement (SQL);                          Ps.setstring (1, CLXXBH);                          Ps.setstring (2, HPHM);                          System.out.println (Ps.tostring ());                          Ps.execute ();                          flag=true;                  } catch (Exception e) {                          e.printstacktrace ();                  } finally{                          try {                                ps.close ();                        } catch (SQLException e) {                                e.printstacktrace ();                        }                  }                 return flag;          }

3. Use Eclipse to import the project package into a hive class environment using a jar package

hive> Add jar  Hiveudf2.jar

4. Import the MySQL JDBC driver package into the Hive class environment

hive> Add Jar/home/hadoopuser/cloud/hive/apache-hive-0.13.1-bin/lib/mysql-connector-java-5.1.18-bin.jar

5. Create hive Temporary function

Hive> Create temporary function  analyze as ' com.gxnzx.hive.udf.AnalyzeStatistics ';

6. Testing

Hive> Select Analyze (CLXXBH,HPHM) from transjtxx_hbase limit 10; Total jobs = 1Launching Job 1 out of 1Number of reduce tasks are set to 0 since there ' s no reduce operatorstarting Job = Jo b_1428394594787_0034, Tracking URL = Http://secondmgt:8088/proxy/application_1428394594787_0034/Kill Command =/home/ Hadoopuser/cloud/hadoop/programs/hadoop-2.2.0/bin/hadoop Job-kill Job_1428394594787_0034hadoop Job information for Stage-1: Number of mappers:1; Number of reducers:02015-04-23 10:15:34,355 Stage-1 map = 0, reduce = 0%2015-04-23 10:15:51,032 Stage-1 map = 100%, re Duce = 0, Cumulative CPU 7.14 secmapreduce Total cumulative cpu time:7 seconds, msecended Job = job_1428394594787_003 4MapReduce Jobs launched:job 0:map:1 Cumulative cpu:7.14 sec HDFs read:256 HDFs write:532 successtotal MapReduce CPU time Spent:7 seconds msecOK32100017000000000220140317000015 SUCCESS lu Q5818232100017000000000220140317000016 SU ccess Lu QV466232100017000000000220140317000019 SUCCESS suLL812832100017000000000220140317000020 SUCCESS su CAH36732100017000000000220140317000023 SUCCESS  Lu q7899w32100017000000000220140317000029 SUCCESS su HN381932100017000000000220140317000038 SUCCESS  Lu C0157632100017000000000220140317000044 SUCCESS su DT917832100017000000000220140317000049 SUCCESS Su LZ111232100017000000000220140317000052 SUCCESS su K9795 police time taken:35.815 seconds, Fetched:10 row (s)

7. View data in MySQL table

Mysql> SELECT * from jtxx2;+----------------------------------+-------------+| CLLXBH                           | hphm        |+----------------------------------+-------------+| 32100017000000000220140317000015 | Lu Q58182 | |    32100017000000000220140317000016 | lu QV4662    | | 32100017000000000220140317000019 | su LL8128    | | 32100017000000000220140317000020 | Su CAH367    | | 32100017000000000220140317000023 | LU q7899w    | | 32100017000000000220140317000029 | su HN3819    | | 32100017000000000220140317000038 | Lu C01576 | |    32100017000000000220140317000044 | su DT9178    | | 32100017000000000220140317000049 | su LZ1112    | | 32100017000000000220140317000052 | Su K9795 police   |+----------------------------------+-------------+10 rows in Set (0.00 sec)

second, the use of Genericudf way

Using the Genericudf method, the implementation is more complex, I refer to someone else's code, as follows:

1. Write Call function

Package Com.gxnzx.hive.main;import Java.sql.connection;import Java.sql.drivermanager;import Java.sql.preparedstatement;import Java.sql.sqlexception;import Org.apache.commons.logging.log;import Org.apache.commons.logging.logfactory;import Org.apache.hadoop.hive.ql.exec.description;import Org.apache.hadoop.hive.ql.exec.udfargumenttypeexception;import org.apache.hadoop.hive.ql.metadata.HiveException ; Import Org.apache.hadoop.hive.ql.udf.udftype;import Org.apache.hadoop.hive.ql.udf.generic.genericudf;import Org.apache.hadoop.hive.serde.constants;import Org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; Import Org.apache.hadoop.hive.serde2.objectinspector.primitiveobjectinspector;import Org.apache.hadoop.hive.serde2.objectinspector.primitive.primitiveobjectinspectorfactory;import Org.apache.hadoop.hive.serde2.objectinspector.primitive.stringobjectinspector;import org.apache.hadoop.io.intwritable;/** * Analyzegenericudfdboutput is designed to output data directly from Hive to a * JDBC datastore. This UDF was useful for exporting small to medium summaries * that has a unique key. * * Due to the nature of Hadoop, individual mappers, reducers or entire jobs can * fail. If A failure occurs a mapper or reducer may be retried. This UDF has no * to detecting failures or rolling back a transaction. Consequently, you * should only the use of this to the export to a table with a unique key. The unique * key should safeguard against duplicate data. * Should follow below three steps first of all, you need * to packag the UDF into the jar file; Secondly, you should use hive Add jar * feature to add the UDF jar file to current class path; Thirdly, you should * the Use hive Add Jar feature to add JDBC Driver the jar file to the current class path; * Fourthly, you should use hive create temporary function feature to create a * temporary function belong to the UDF Clas S. * * Examples for mysql:hive> add jar Udf.jar hive> add jar * mysql-connector-java-5.1.18-Bin.jar hive> Create temporary function * Analyzedboutput as ' com.gxnzx.hive.main.AnalyzeGenericUDFDBOutput ' */@ Description (name = "Analyzedboutput", value = "_func_ (jdbctring,username,password,preparedstatement,[arguments])" + " -Sends data to a JDBC driver ", extended =" argument 0 are the JDBC Connection string\n "+" Argument 1 is the database user name\n "+" argument 2 is the database user's password\n "+" argument 3 is a SQL query to BES used in the preparedstatement\n "+" argument (4-n) The remaining arguments must be primitive and is "+" passed to the PreparedStatement object\n ") @UDFTyp E (deterministic = false) public class Analyzegenericudfdboutput extends genericudf {private static final log log = Logfacto Ry.getlog (AnalyzeGenericUDFDBOutput.class.getName ());p rivate transient objectinspector[] argumentoi;private Transient Connection Connection = null;private string Url;private string User;private string pass;private final intwritabl E result = new intwritable ( -1);/** * @param argumeNTS * Argument 0 is the JDBC connection string Argument 1 are the * User Name argument 2 is the PASSW Ord Argument 3 is a SQL * query to being used in the PreparedStatement argument (4-n) the * remaining Arguments must be primitive and is passed to the * PreparedStatement Object */@Overridepublic Objectinspector Initialize (objectinspector[] arguments) throws udfargumenttypeexception {Argumentoi = arguments;//this should be  connection//url,username,password,query,column1[,columnn]*for (int i = 0; i < 4; i++) {if (Arguments[i].getcategory () = = ObjectInspector.Category.PRIMITIVE) {Primitiveobjectinspector poi = ((primitiveobjectinspector) arguments[i]); if ( ! (poi.getprimitivecategory () = = PrimitiveObjectInspector.PrimitiveCategory.STRING)) {throw new Udfargumenttypeexception (I, "the argument of function should is \" "+ Constants.string_type_name +" \ ", but \" "+ Arguments[i].gettypename () + "\" is found ");}}} for (int i = 4; I < ArgumenTs.length; i++) {if (arguments[i].getcategory ()! = ObjectInspector.Category.PRIMITIVE) {throw new Udfargumenttypeexception (I, " The argument of function should be primative ' + ', but ' "" + arguments[i].gettypename () + "\" is found ");}} return primitiveobjectinspectorfactory.writableintobjectinspector;} /** * @return 0 on Success-1 on Failure */@Overridepublic Object evaluate (deferredobject[] arguments) throws Hiveexceptio n {url = ((stringobjectinspector) argumentoi[0]). Getprimitivejavaobject (Arguments[0].get ()); user = (( Stringobjectinspector) (Argumentoi[1]). Getprimitivejavaobject (Arguments[1].get ());p (Stringobjectinspector) ARGUMENTOI[2]). Getprimitivejavaobject (Arguments[2].get ()); try {connection = drivermanager.getconnection (URL, user, pass);} catch (SQLException ex) {log.error ("Driver Loading or connection issue", ex); Result.set (2);} if (connection! = null) {try {preparedstatement PS = connection.preparestatement ((stringobjectinspector) argumentoi[3] ). Getprimitivejavaobject (ArgumenTs[3].get ())); for (int i = 4; i < arguments.length; ++i) {Primitiveobjectinspector poi = ((primitiveobjectinspector) AR Gumentoi[i]);p s.setobject (I-3,poi.getprimitivejavaobject (Arguments[i].get ())); Ps.execute ();p s.close (); result.set (0);} catch (SQLException e) {log.error ("Underlying SQL exception", e); Result.set (1);} finally {try {connection.close ();} catch (Exception ex) {log.error ("underlying SQL Exception during close", Ex);}}} return result;} @Overridepublic String getdisplaystring (string[] children) {StringBuilder SB = new StringBuilder (); Sb.append ("Dboutput ("); if (Children.length > 0) {sb.append (children[0]); for (int i = 1; i < children.length; i++) {sb.append (","); Sb.append (Children[i]);}} Sb.append (")"); return sb.tostring ();}}

2, the program into a jar package, import into the Hive class path

hive> add jar  Hivegenericudf.jar;

3. Add MySQL JDBC driver jar file

hive> Add Jar/home/hadoopuser/cloud/hive/apache-hive-0.13.1-bin/lib/mysql-connector-java-5.1.18-bin.jar

4. Create temporary functions

Hive> Create temporary function analyzedboutput as ' com.gxnzx.hive.main.AnalyzeGenericUDFDBOutput ';

5. Testing

Hive> Select Analyzedboutput (' Jdbc:mysql://192.168.2.133:3306/transport ', ' hive ', ' hive ', ' insert into jtxx2 values ' (?,?) ', CLXXBH,HPHM) from transjtxx_hbase limit 5; Total jobs = 1Launching Job 1 out of 1Number of reduce tasks are set to 0 since there ' s no reduce operatorstarting Job = Jo b_1428394594787_0043, Tracking URL = Http://secondmgt:8088/proxy/application_1428394594787_0043/Kill Command =/home/ Hadoopuser/cloud/hadoop/programs/hadoop-2.2.0/bin/hadoop Job-kill Job_1428394594787_0043hadoop Job information for Stage-1: Number of mappers:1; Number of reducers:02015-04-23 22:01:46,205 Stage-1 map = 0, reduce = 0%2015-04-23 22:02:01,985 Stage-1 map = 100%, re Duce = 0, Cumulative CPU 9.37 secmapreduce Total cumulative cpu Time:9 seconds 370 msecended Job = job_1428394594787_004 3MapReduce Jobs launched:job 0:map:1 Cumulative cpu:9.37 sec HDFs read:256 hdfs write:10 successtotal MapReduce C PU time Spent:9 seconds 370 msecok00000time taken:32.118 seconds, Fetched:5 row (S

Analyzedboutput six parameters are: MySQL jdbc connection string, MySQL data user name, password, SQL INSERT statement, corresponding CLXXBH,HPHM two query fields in hive table.

6. View MySQL database table data

Mysql> select * from jtxx2; Empty Set (0.00 sec) mysql> Select * from jtxx2;+----------------------------------+-----------+| CLLXBH                           | hphm      |+----------------------------------+-----------+| 32100017000000000220140317000015 | Lu Q58182 | |  32100017000000000220140317000016 | lu QV4662  | | 32100017000000000220140317000019 | su LL8128  | | 32100017000000000220140317000020 | Su CAH367  | | 32100017000000000220140317000023 | LU q7899w  |+----------------------------------+-----------+5 Rows in Set (0.00 sec)

End here

Import hive Statistical analysis results into a MySQL database table (iii)-using hive UDF or genericudf

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More