The collector assists Java in processing the HDFs of a diverse data source

Source: Internet
Author: User

JavathroughHadoopprovided byAPIAccessHDFSnot difficult, but the calculation of the file on it is more troublesome. such as grouping, filtering, sorting and other calculations, withJavaare more complex to implement. The CollectorEsprocto be able to help very wellJavasolve computational problems, but also encapsulateHDFSaccess, with the help ofEsproccan letJavaStrengthenHDFSThe computational power of the file, structured semi-structured data calculations can be done easily. Let's take a look at the concrete examples below.

HDFS The employee data is saved in the text file in employee.gz . We want to read the employee information and find out The female employees who were born after 1981 1 months and 1 days (inclusive). The text file is zipped in HDFS and cannot be loaded into memory at one time.

The data for the text file empolyee.gz is as follows:

650) this.width=650; "src=" Https://s4.51cto.com/wyfs02/M00/8E/B8/wKioL1jKNITBcscmAABq9v_JWmA705.png "title=" 3.png "alt=" Wkiol1jknitbcscmaabq9v_jwma705.png "/>

the idea of realization is: Use The Java program calls the collector script, reads and computes the data, and then returns the result to the Java program in a ResultSet manner .

First, to write and debug a program in the integrated development environment of the collector, the preparation is to make the HadoopCore PackageandConfiguration Package Copy to the "Collector installation directory \esproc\lib", as :commons-configuration-1.6.jar,Commons-lang-2.4.jar ,HAdoop-core-1.0.4.jar (Hadoop1.0.4).

because the collector supports dynamic expression solutions Analysis and evaluation, so that the java program can be used like sql Span style= "font-family: ' The song Body '; > So, flexible filtering hdfs data in the file. For example, we need to query 1981 year 1 month 1 esproc The program can obtain an input parameter from the outside " where " as a condition, such as:

650) this.width=650; "src=" Https://s3.51cto.com/wyfs02/M00/8E/B8/wKioL1jKNKrSbU9iAABto1arIUk845.png "title=" 1.png "alt=" Wkiol1jknkrsbu9iaabto1ariuk845.png "/>

where is a string, the value is: birthday>=date (1981,1,1) && gender== "F" .

The Esproc code is as follows:

650) this.width=650; "src=" Https://s2.51cto.com/wyfs02/M02/8E/BA/wKiom1jKNOGQEqt6AAAWn-w5bLA644.png "title=" 2.png "alt=" Wkiom1jknogqeqt6aaawn-w5bla644.png "/>

a1: Define a hdfs file object cursor, first row is header, field delimiter default is gzip format, The collector also supports other compression methods. utf-8 JVM

a2: Filters the cursor by condition. Here, a macro is used to implement the dynamic parse expression, where where is the incoming parameter. The collector calculates the ${ }${ } =a1.select ( birthday>=date (1981,1,1) && gender== "F" )

A3: Returns a cursor.

Change The filter condition without changing the code, just change the where parameter. For example, the condition becomes: query The female employee who was born after the 1981 year 1 months 1 Days (inclusive), or name+ SURNAME equals "Rebeccamoore"of employees. Where 's parameter values can be written as: birthday>=date (1981,1,1) && gender== "F" | | name+surname== "Rebeccamoore".

Java esproc The program is saved as test.dfx hdfs required jar package put to java classpath

establishing a esproc jdbc connection

class.forname ("com. Esproc . jdbc. Internaldriver ");

con= drivermanager.getconnection ("jdbc:esproc: local://");

call the ESPROC program (stored procedure), where test is the file name of DFX

st = (com.esproc. jdbc. internalcstatement)Con.preparecall (" call test(?)");

Setting Parameters

St.setobject (1,"birthday>=date (1981,1,1) && gender==\"F\" || name+surname==\"Rebeccamoore\"");// parameter is the dynamic filter condition

executing esproc stored Procedures

St.execute ();

Get result set: Eligible Employee collection

ResultSet set = St.getresultset ();


This article from "12691508" blog, declined reprint!

The collector assists Java in processing the HDFs of a diverse data source

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.