Java only provides the most basic data reading functions, such as the specified delimiter, while other common functions need to be implemented from the bottom, such as reading the specified columns by column name, specifying the order of the columns, specifying the data type, no delimiters, and so on. Java is not difficult to implement such functionality, but the code is cumbersome and prone to error.
Using a collector to aid in Java programming, these problems do not require you to write code to solve. Let's take a look at the concrete examples below.
Text file data.txt is a tab-separated text file with 30 columns, the first row is a business-meaningful column name, which now needs to be read into these columns by column name: ID, X1shift, X2shift, Radio, and by Business formula "((X1shift+x2shift)/2) *radio "Calculates the new column value. The first few lines of the file are listed below:
In the case of Java, we have to split all 30 columns and then use the subscript to refer to a specific column for calculation, and if the formulas are large and the calculations are more complex, the probability of an error is great. To reduce the false write, we can only store each piece of data with objects, give each field a business name, and then calculate the formula by business name.
The collector can help Java avoid these problems, the code is as follows:
A1: The function import is used to read the file, but not all 30 columns are read into memory, but are read into the specified column by column name. The parameter option @t indicates that the first row is read as a column name. This step is calculated as follows:
A2: Calculated directly According to the business name, the results are as follows:
In actual use, the above calculation results are sometimes exported to a file, the code can be used to achieve this purpose: =file ("E:\\result.txt") [email protected] (a2.new (id,value)), This means that the two columns, ID and value, are written to the file Result.txt, with the following contents:
If you need to pass the results back to Java, simply write the code in the Collector: result A2.new (Id,value)), which means that the ID and result columns are returned to Java through the JDBC interface, and the data type is resultset. The result can then be obtained from the Java code using the JDBC call to the collector script, as shown in the code below.
Establishing a Esproc JDBC connection
Class.forName ("Com.esproc.jdbc.InternalDriver");
con= drivermanager.getconnection ("jdbc:esproc:local://");
Call Esproc, where test is the script file name
St = (com.esproc.jdbc.InternalCStatement) con.preparecall ("Call Test ()");
St.execute ();//execute Esproc stored procedure
ResultSet set = St.getresultset (); Get result set
When reading data, it is sometimes necessary to specify the order of the columns in order to manipulate the data more intuitively. For example, for the same file data.txt, this time read the data in the new Order of X1shift, X2shift, radio, and ID. The collector can specify the order directly, just write the following code: =file ("E:\\data.txt") [email protected] (xshift,yshift,ratio,id).
The calculation results are as follows:
In the code above, the collector automatically sets the appropriate type for the data, such as Xshift and Yshift, which are set to float type. But sometimes we need to specify a data type, such as an ID that is similar to an integer but actually a string. If the first 4 characters of the ID are to be taken out separately, the collector can be implemented with the following code:
A1: Coercion type conversion, reading the ID column as a string, the result is as follows:
Note: The Collector contract string is left-aligned in the IDE, and the numbers are right-aligned, as shown above.
A2: Intercepts the first four characters, the result is as follows:
When reading data, there are times when there are no delimiters, such as Data2.txt has 20 columns, some of the data are as follows:
As you can see, Data2.txt has no column separators, and some of the data is useless blank lines. The collector can read the correct data by using the following code:
A1: Reads the data as a single-column list with the default name of "_1". Where the function option @s indicates that the field is not split and read directly. The results are as follows:
A2:A1.select (Trim (_1) = ""), filtering out non-empty lines. The function select can be queried by field name or ordinal, with the following result:
A3:=a2.new (Mid (_1,1,1), Mid (_1,2,1), Mid (_1,3,1), Mid (_1,4,1), Mid (_1,5,1), Mid (_1,6,1), Mid (_1,7,1), Mid (_1,8,1), mid (_1,9,1), Mid (_1,10,1), Mid (_1,11,1), Mid (_1,12,1), Mid (_1,13,1), Mid (_1,14,1),
Mid (_1,15,1), Mid (_1,16,1), Mid (_1,17,1), Mid (_1,18,1), Mid (_1,19,1), Mid (_1,20,1))
This long code is used to split each row of data into 20 fields. The function mid has three parameters, namely: The name of the field being split, the starting position, and the length of the Intercept. The results of the split are as follows:
A3 is the result of the calculations we need.
The code in A3 is too long for error checking and maintenance and can be simplified using the dynamic code of the Collector, as follows:
A4:=20.loops (~~+ "mid" (_1, "+ string (~) +", 1), ")
A5:=exp=left (A4,len (A4)-1)
A6:=eval ("a2.new (" + a5+ ")")
In A4, the function loops can be used for looping calculations to generate a regular string, i.e. "mid (_1,1,1), Mid (_1,2,1), Mid (_1,3,1), Mid (_1,4,1), Mid (_1,5,1), Mid (_1,6,1), Mid (_ 1,7,1), Mid (_1,8,1),
Mid (_1,9,1), Mid (_1,10,1), Mid (_1,11,1), Mid (_1,12,1), Mid (_1,13,1), Mid (_1,14,1), Mid (_1,15,1),
Mid (_1,16,1), Mid (_1,17,1), Mid (_1,18,1), Mid (_1,19,1), Mid (_1,20,1), "
The string A4 a comma at the end, and the code in the A5 can remove the comma.
A6: Executes the dynamic script. The function eval can dynamically parse a string into an expression, such as eval ("2+3") equivalent to an expression 2+3 with a value of 5. So the expression in A5 is actually exactly the same as the A3, and the result is naturally exactly the same:
The collector assists Java in the processing of data read-through for structured text