The Set calculator assists Java in processing conditional filtering of structured text, and java structuring
Using Java to filter data in text files by conditions may cause the following troubles:
1. files are not databases and cannot be accessed using SQL. When the filter conditions change, you must rewrite the code. If you want to implement SQL-like flexible conditional filtering, You need to implement dynamic expression parsing and Evaluation on your own. The programming workload is very large.
2. When a file is too large, it cannot be loaded into the memory for processing at a time. The progressive reading method involves complex programming such as File Buffer Management and split-row computing.
Java programming is assisted by the set calculator. You do not need to write code to solve these problems. The following is an example of a specific practice.
Employee data is saved in the employee file employee.txt. We need to read employee information and find out the female employees born after January 1, January 1, 1981.
The format of the example file empolyee.txt is as follows:
EID NAME SURNAME GENDER STATE BIRTHDAY HIREDATE DEPT SALARY
1 Rebecca Moore F California 1974-11-20 2005-03-11 R & D 7000
2 Ashley Wilson F New York 1980-07-19 2008-03-16 Finance 11000
3 Rachel Johnson F New Mexico 1970-12-17 Sales 9000
4 Emily Smith F Texas 1985-03-07 HR 7000
5 Ashley Smith F Texas 1975-05-13 2004-07-30 R & D 16000
6 Matthew Johnson M California 1984-07-07 Sales 11000
7 Alexis Smith F Illinois 1972-08-16 2002-08-16 Sales 9000
8 Megan Wilson F California 1979-04-19 1984-04-19 Marketing 11000
9 Victoria Davis F Texas 1983-12-07 2009-12-07 HR 3000
10 Ryan Johnson M Pennsylvania 1976-03-12 2006-03-12 R & D 13000
11 Jacob Moore M Texas 1974-12-16 2004-12-16 Sales 12000
12 Jessica Davis F New York Sales 7000
13 Daniel Davis M Florida 1982-05-14 2010-05-14 Finance 10000
...
The idea of implementation is: Use a Java program to call the Set Computing script, read and compute data, and then return the results to the Java program in ResultSet mode. Because the cube Reader supports dynamic expression parsing and value evaluation, Java programs can filter data in text files as flexibly as they use SQL.
For example, if we need to query the female employees born after January 1, January 1, 1981 (inclusive), the esProc program can obtain an input parameter "where" from outside as a condition, for example:
Where is a string; Value: BIRTHDAY> = date (1981,1, 1) & GENDER = "F ".
The esProc code is as follows:
A1: defines a file object and reads data. The first line is the title, and the field separator is tab by default. EsProc's integrated development environment can intuitively display imported data, such as the right part.
A2: Filter by conditions. Here, a macro is used to implement a dynamic parsing expression, where is the input parameter. The set operator calculates $ {…} first {...} Replace the calculation result with the macro string value $ {...} Then explain and execute. In this example, the final execution is: = A1.select (BIRTHDAY> = date (1981,1, 1) & GENDER = "F ").
A3: returns a qualified result set to an external program.
When the filter condition changes, you do not need to change the code. You only need to change the where parameter. For example, if the condition is changed to: Query an employee born after January 1, January 1, 1981, or an employee whose NAME + SURNAME is "RebeccaMoore. The Where parameter value can be written as: BIRTHDAY> = date (1981,1, 1) & GENDER = "F" | NAME + SURNAME = "RebeccaMoore ". After execution, the result set in A2 is as follows:
The code for calling this program using esProc JDBC in a Java program to obtain the result is as follows: (Save the esProc program as test. dfx ):
// Establish an esProc jdbc connection
Class. forName ("com. esproc. jdbc. InternalDriver ");
Con = DriverManager. getConnection ("jdbc: esproc: local ://");
// Call the esProc Program (stored procedure), where test is the file name of dfx
St = (com. esproc. jdbc. InternalCStatement) con. prepareCall ("call test (?)");
// Set parameters
St. setObject (1, "BIRTHDAY> = date (1981,1, 1) & GENDER = \" F \ "| NAME + SURNAME = \" RebeccaMoore \""); // The parameter is a dynamic filtering condition.
// Execute the esProc Stored Procedure
St.exe cute ();
// Obtain the result set: a set of qualified employees
ResultSet set = st. getResultSet ();
For simple scripts, you can also directly write the code in the Java program of the call set calculator JDBC without having to write the script file (test. dfx ):
St = (com. esproc. jdbc. InternalCStatement) con. createStatement ();
ResultSet set = st.exe cuteQuery ("= file (\" D:/employee.txt \"). import @ t (). select (BIRTHDAY> = date (1981,1, 1) & GENDER = \ "F \" | NAME + SURNAME = \ "RebeccaMoore \")");
This Java code directly calls a script of the Set calculator: Retrieves data from a text file and filters the data according to the specified conditions. The result set is returned to the ResultSet object set.
In the above method, if the file is small, all files can be read into the memory. But in fact, a large file may not be able to read the content, and even if you can read it, there is no need to occupy too much memory. In this case, you can use the File cursor method for processing. The configuration calculator program is adjusted as follows:
A1: defines a file object cursor. The first line is the title, and the field separator is tab by default.
A2: filter the cursor according to the condition. Here, a macro is used to implement a dynamic parsing expression, where is the input parameter. The set operator calculates $ {…} first {...} Replace the calculation result with the macro string value $ {...} Then explain and execute. In this example, the final execution is: = A1.select (BIRTHDAY> = date (1981,1, 1) & GENDER = "F ").
A3: returns the cursor.
Although the aggregator returns a cursor to Java, the program called by Java does not need to be modified. When Java uses ResultSet to traverse data, the set operator automatically retrieves the content corresponding to the cursor.
If you need to write the filtered data to another file instead of returning it to the main program, you only need to change the expression of A3 lattice to: = file ("D:/employee_group.txt "). export @ t (A2). The set operator writes cursor data into a file.