JavathroughHadoopprovided byAPIAccessHDFSnot difficult, but the calculation of the file on it is more troublesome. such as grouping, filtering, sorting and other calculations, withJavaare more complex to implement. The CollectorEsprocto be able to help very wellJavasolve computational problems, but also encapsulateHDFSaccess, with the help ofEsproccan letJavaStrengthenHDFSThe computational power of the file, structured semi-structured data calculations can be done easily. Let's take a look at the concrete examples below.
HDFS The employee data is saved in the text file in employee.gz . We want to read the employee information and find out The female employees who were born after 1981 1 months and 1 days (inclusive). The text file is zipped in HDFS and cannot be loaded into memory at one time.
The data for the text file empolyee.gz is as follows:
650) this.width=650; "src=" Https://s4.51cto.com/wyfs02/M00/8E/B8/wKioL1jKNITBcscmAABq9v_JWmA705.png "title=" 3.png "alt=" Wkiol1jknitbcscmaabq9v_jwma705.png "/>
the idea of realization is: Use The Java program calls the collector script, reads and computes the data, and then returns the result to the Java program in a ResultSet manner .
First, to write and debug a program in the integrated development environment of the collector, the preparation is to make the HadoopCore PackageandConfiguration Package Copy to the "Collector installation directory \esproc\lib", as :commons-configuration-1.6.jar,Commons-lang-2.4.jar ,HAdoop-core-1.0.4.jar (Hadoop1.0.4).
because the collector supports dynamic expression solutions Analysis and evaluation, so that the java program can be used like sql Span style= "font-family: ' The song Body '; > So, flexible filtering hdfs data in the file. For example, we need to query 1981 year 1 month 1 esproc The program can obtain an input parameter from the outside " where " as a condition, such as:
650) this.width=650; "src=" Https://s3.51cto.com/wyfs02/M00/8E/B8/wKioL1jKNKrSbU9iAABto1arIUk845.png "title=" 1.png "alt=" Wkiol1jknkrsbu9iaabto1ariuk845.png "/>
where is a string, the value is: birthday>=date (1981,1,1) && gender== "F" .
The Esproc code is as follows:
650) this.width=650; "src=" Https://s2.51cto.com/wyfs02/M02/8E/BA/wKiom1jKNOGQEqt6AAAWn-w5bLA644.png "title=" 2.png "alt=" Wkiom1jknogqeqt6aaawn-w5bla644.png "/>
a1: Define a hdfs file object cursor, first row is header, field delimiter default is gzip format, The collector also supports other compression methods. utf-8 JVM
a2: Filters the cursor by condition. Here, a macro is used to implement the dynamic parse expression, where where is the incoming parameter. The collector calculates the ${ }${ } =a1.select ( birthday>=date (1981,1,1) && gender== "F" )
A3: Returns a cursor.
Change The filter condition without changing the code, just change the where parameter. For example, the condition becomes: query The female employee who was born after the 1981 year 1 months 1 Days (inclusive), or name+ SURNAME equals "Rebeccamoore"of employees. Where 's parameter values can be written as: birthday>=date (1981,1,1) && gender== "F" | | name+surname== "Rebeccamoore".
Java esproc The program is saved as test.dfx hdfs required jar package put to java classpath
establishing a esproc jdbc connection
class.forname ("com. Esproc . jdbc. Internaldriver ");
con= drivermanager.getconnection ("jdbc:esproc: local://");
call the ESPROC program (stored procedure), where test is the file name of DFX
st = (com.esproc. jdbc. internalcstatement)Con.preparecall (" call test(?)");
Setting Parameters
St.setobject (1,"birthday>=date (1981,1,1) && gender==\"F\" || name+surname==\"Rebeccamoore\"");// parameter is the dynamic filter condition
executing esproc stored Procedures
St.execute ();
Get result set: Eligible Employee collection
ResultSet set = St.getresultset ();
This article from "12691508" blog, declined reprint!
The collector assists Java in processing the HDFs of a diverse data source