The collector assists Java in processing the hive of a diverse data source

Last Update:2016-01-15 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Java is simpler to use with JDBC to connect to hive, but the computational power of hive is weaker than that of other databases, and it is cumbersome to perform unconventional computations that require data to be taken out of Java for further computation.

Using the collector with Java programming can reduce the complexity of the computational effort required for Java to use hive. Let's take a look at the following example: The Orders table in hive holds the details of the order, and it needs to calculate the same time ratio and the previous period. The data are as follows:

    orderid CLIENT sellerid AMOUNT ORDERDATE
    1 UJRNP 17 392 2008/11/ 2 15:28
    2 sjch 6 4802 2008/11/9 15:28
    3 UJRNP 16 13500 2008/11/5 15 :
    4 pwq 9 26100 2008/11/8 15:28
    5 pwq 11 4410 2008/11/12 15:28< Br>    6 hanar 6174 2008/11/7 15:28
    7 EGU 2 17800 2008/11/6 15:28
& Nbsp;   8 VILJX 7 2156 2008/11/9 15:28
    9 jayb + 17400 2008/11/12 15:28
& nbsp;   10 jaxe 19200 2008/11/12 15:28
    11 SJCH 7 13700 2008/11/10 15:28
    12 QUICK 21200 2008/11/13 15:28
    13 HL 12 21400 2008/11/21 15:28
    14 Jayb 1 7644 2008/11/16 15:28
    15 MIP 16 3234 2008/11/19 15:28< BR>&NBSP;&NBSP;&NBSP;&NBSP, .....

The last period is compared with the current data and the previous period data, the month as the time interval, for example, April sales divided by March sales, this is called April than the previous period. The same-time ratio refers to the comparison between the current period data and the previous period data, such as April 2014 sales divided by April 2013 sales. Because Hive does not have a window function, it is difficult to complete this computational requirement, you must write a nested SQL subquery, and the support for the hive subquery is not complete enough, which often needs to be implemented externally. The Esproc of the collector can be implemented easily, with the following code:

A1: Connect to the database through JDBC using a well-defined hive data source.

A2: Querying data from a database by time period, begin and end are external parameters, such as begin= "2011-01-01 00:00:00″,end=" 2014-07-08 00:00:00″ (that is, the day date, which can be obtained using the Now () function).

A3: Groups orders by year and month, and summarizes sales for each month.

A4: Adds a new field LRR, which is the month-to-date, whose expression is mamount/mamount[-1]. In the code, Mamount represents the current sales, mamount[-1] represents the sales for the previous period. It is important to note that the first month's ratio of the previous period value is empty (that is, January 2011).

A5: The A4 is sorted by month and year in order to calculate the same time ratio. The complete code should be: =a4.sort (m,y), because A4 is originally sorted by year, so you can achieve the goal by the month sort, namely A4.sort (m), so the performance is also high. A6: Add a new field YoY, that is, the monthly sales of the same-time ratio, whose expression is if (m==m[-1],mamount/mamount[-1],null), which means that the month is the same as the calculation of the same period. It is important to note that the period of the initial year (that is, 2011) is empty for each month.

A7: The A6 is sorted according to the ordinal sequence of the year in reverse order. It is important to note that the data is only until July 2014. The results are as follows:

A8: Closes the hive database connection.

A9: Returns the result.

The code for using Esproc JDBC to invoke this program in a Java program is as follows: (Save the above Esproc program as TEST.DFX):
Establishing a Esproc JDBC connection
Class.forName ("Com.esproc.jdbc.InternalDriver");
con= drivermanager.getconnection ("jdbc:esproc:local://");
Call the ESPROC program (stored procedure), where test is the file name of DFX
St = (com.esproc.jdbc.InternalCStatement) con.preparecall ("Call Test (?,?)");
Setting parameters
St.setobject (1, "2011-01-01 00:00:00″);//begin
St.setobject (1, "2014-07-08 00:00:00″);//end
Executing Esproc stored Procedures
St.execute ();
Get result set
ResultSet set = St.getresultset ();

The collector accesses hive as well as accesses a normal database, with the JDBC of hive, which is not mentioned here.

The collector assists Java in processing the hive of a diverse data source

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The collector assists Java in processing the hive of a diverse data source

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The collector assists Java in processing the hive of a diverse data source

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support