The collector assists Java in processing the hive of a diverse data source

Source: Internet
Author: User

Java is simpler to use with JDBC to connect to hive, but the computational power of hive is weaker than that of other databases, and it is cumbersome to perform unconventional computations that require data to be taken out of Java for further computation.

Using the collector with Java programming can reduce the complexity of the computational effort required for Java to use hive. Let's take a look at the following example: The Orders table in hive holds the details of the order, and it needs to calculate the same time ratio and the previous period. The data are as follows:

    orderid CLIENT sellerid AMOUNT ORDERDATE
    1 UJRNP 17 392 2008/11/ 2 15:28
    2 sjch 6 4802 2008/11/9 15:28
    3 UJRNP 16 13500 2008/11/5 15 :
    4 pwq 9 26100 2008/11/8 15:28
    5 pwq 11 4410 2008/11/12 15:28< Br>    6 hanar 6174 2008/11/7 15:28
    7 EGU 2 17800 2008/11/6 15:28
& Nbsp;   8 VILJX 7 2156 2008/11/9 15:28
    9 jayb + 17400 2008/11/12 15:28
& nbsp;   10 jaxe 19200 2008/11/12 15:28
    11 SJCH 7 13700 2008/11/10 15:28
    12 QUICK 21200 2008/11/13 15:28
    13 HL 12 21400 2008/11/21 15:28
    14 Jayb 1 7644 2008/11/16 15:28
    15 MIP 16 3234 2008/11/19 15:28< BR>&NBSP;&NBSP;&NBSP;&NBSP, .....

The last period is compared with the current data and the previous period data, the month as the time interval, for example, April sales divided by March sales, this is called April than the previous period. The same-time ratio refers to the comparison between the current period data and the previous period data, such as April 2014 sales divided by April 2013 sales. Because Hive does not have a window function, it is difficult to complete this computational requirement, you must write a nested SQL subquery, and the support for the hive subquery is not complete enough, which often needs to be implemented externally. The Esproc of the collector can be implemented easily, with the following code:

A1: Connect to the database through JDBC using a well-defined hive data source.

A2: Querying data from a database by time period, begin and end are external parameters, such as begin= "2011-01-01 00:00:00″,end=" 2014-07-08 00:00:00″ (that is, the day date, which can be obtained using the Now () function).

A3: Groups orders by year and month, and summarizes sales for each month.

A4: Adds a new field LRR, which is the month-to-date, whose expression is mamount/mamount[-1]. In the code, Mamount represents the current sales, mamount[-1] represents the sales for the previous period. It is important to note that the first month's ratio of the previous period value is empty (that is, January 2011).

A5: The A4 is sorted by month and year in order to calculate the same time ratio. The complete code should be: =a4.sort (m,y), because A4 is originally sorted by year, so you can achieve the goal by the month sort, namely A4.sort (m), so the performance is also high. A6: Add a new field YoY, that is, the monthly sales of the same-time ratio, whose expression is if (m==m[-1],mamount/mamount[-1],null), which means that the month is the same as the calculation of the same period. It is important to note that the period of the initial year (that is, 2011) is empty for each month.

A7: The A6 is sorted according to the ordinal sequence of the year in reverse order. It is important to note that the data is only until July 2014. The results are as follows:

A8: Closes the hive database connection.

A9: Returns the result.

The code for using Esproc JDBC to invoke this program in a Java program is as follows: (Save the above Esproc program as TEST.DFX):
Establishing a Esproc JDBC connection
Class.forName ("Com.esproc.jdbc.InternalDriver");
con= drivermanager.getconnection ("jdbc:esproc:local://");
Call the ESPROC program (stored procedure), where test is the file name of DFX
St = (com.esproc.jdbc.InternalCStatement) con.preparecall ("Call Test (?,?)");
Setting parameters
St.setobject (1, "2011-01-01 00:00:00″);//begin
St.setobject (1, "2014-07-08 00:00:00″);//end
Executing Esproc stored Procedures
St.execute ();
Get result set
ResultSet set = St.getresultset ();

The collector accesses hive as well as accesses a normal database, with the JDBC of hive, which is not mentioned here.

The collector assists Java in processing the hive of a diverse data source

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.