Help Java to read text and help java to read text

Source: Internet
Author: User

Help Java to read text and help java to read text

JAVA provides the most basic file processing functions and can read small text files in a simple and non-structured manner, if you encounter large files that require structuring, diverse formats, and special requirements, or files that cannot be installed in the memory, the corresponding code will be very complicated, and the readability and reusability will be difficult to guarantee.

You can make up for this deficiency by using a free calculator. The Set calculator encapsulates a wide range of structured file read/write and computing functions, and provides JDBC interfaces. JAVA applications can execute the script file of the assembler as a database Stored Procedure, input parameters, and return results using JDBC. For more information, seeThe Set calculator is used as the application structure of the Java computing class library..

The following describes common cases of reading text from JAVA and solutions for the cube.

Read a specified Column
Read the three columns in sorder.txt by column name: OrderID, Client, and Amount. The source data is as follows:

Code of the Set calculator:

  A
1 = File ("D: \ sOrder.txt"). import @ t (OrderID, Client, Amount)

Result:

1. @ t refers to reading 1st rows as column names. When the file does not contain column names, each column can be referenced by serial numbers. For example, when reading columns 1st, 2, and 4, this code can be used: file ("D: \ sOrder.txt "). import (#1, #2, #4). The result is as follows:

2. If you want to output a calculated column, such as assembling the year and OrderID into a newOrderID and outputting it together with Client and Amount, the following code is available:

  A
1 = File ("D: \ sOrder.txt"). import @ t ()
2 = A1.new (string (year (OrderDate) + "_" + string (OrderID): newOrder, Client, Amount)

Function import reads all fields by default. The new function can create a new External table. The result is as follows:

3. the default Delimiter is tab, and other characters can also be used. For example, to read a csv file with a comma as the separator, you can use this code file ("D: \ sOrder.txt "). import @ t (;",").

4. If only some rows are output, you can specify them by the row number. For example, if 2-2,100 rows are output and the code is A1.to (3rd), the code is output starting from line 1 and the code is A1.to (3 ,).

5. in some cases, it will be read by column. For example, you can splice OrderID, Client, and Amount vertically into one column of output. After reading the data, you can implement the following code: create (all ). record (A1. (OrderID) | A1. (Client) | A1. (Amount )).

Read large files

For large files that exceed the memory, you can use the set calculator cursor to read the files. JAVA uses the JDBC stream for access.

Code of the Set calculator:

  A
1 = File ("D: \ sOrder.txt"). cursor @ t (OrderID, Client, Amount)

1. to speed up file reading, you can use multi-thread parallel processing technology. Simply add the @ m option, the code is = file ("D: \ sOrder.txt "). cursor @ tm (OrderID, Client, Amount ). However, due to multi-thread parallel reading, this usage cannot guarantee the order of Data Reading.

2. sometimes you need to manually perform segmentation and parallel computing, then you need to read a certain file and use the code to implement: file ("D: \ sOrder.txt "). import @ z @ t (;, 2: 24)

@ Z indicates that the file is roughly divided into 24 parts based on the number of bytes, and only 2nd parts are read. The set operator automatically takes the header and adds the end to ensure that the retrieved data is a whole row.

If the memory is still not loaded after segmentation, you can change the import function to cursor, that is, the output is a cursor.

Read files by column width

The data.txt file has no separator, as shown below:

You need to read the table in four columns according to the specified width and output it to JAVA. The first three digits of the id column, 10-11 digits of the flag column, 14-24 digits of the d1 column, and 25-33 digits of the d2 column. For example, the four columns in rows 1st are 001, DT, 100000000000, and 3210 XXXX.

Code of the Set calculator:

  A
1 = File ("D: \ data.txt"). import @ I ()
2 = A1.new (mid (~, 1, 3): id, mid (~, 10, 2): flag, mid (~, 14,11): d1, mid (~, 25, 9): d2)

A1: @ I indicates that a sequence (SET) is returned when there is only one column in the file ).

A2: Create a new sequence table based on A1. The mid function can intercept strings ,~ Indicates each row of data.

Result:

The text contains special characters

The file data.csv contains quotation marks, and some quotation marks affect the normal use of data. Now you need to remove the quotation marks and output them to JAVA. The source data is as follows:

Code of the Set calculator:

  A
1 = File ("d :\\ data.csv"). import (;",")
2 = A1.new (replace (_ 1, "\" "," "): _ 1, replace (_ 2," \ "", ""): _ 2,
Replace (_ 3, "\" "," "): _ 3, replace (_ 4," \ "", ""): _ 4)

Result:

Text includes mathematical formulas

The formula in the text needs to be parsed into an expression and then output after calculation. The source data is as follows:

Code of the Set calculator:

  A
1 = File ("D: \ equations.txt"). import @ I ()
2 = As1.new (~ : Equations, eval (string (~)) : Result)

The function eval dynamically parses a string into an expression and executes it.

Result:

Multi-row record
Each of the three lines in the following file represents a record. For example, the first record is JFS 3 468.0 39. Now you need to output the file into a two-dimensional table.

Code of the Set calculator:

  A
1 = File ("D: \ data.txt"). import @ si ()
2 = A1.group (#-1) \ 3)
3 = A2.new (~ (1): OrderID, (line = ~ (2). array ("\ t") (1): Client, line (2): SellerId, line (3): Amount ,~ (3): OrderDate)

First, read the file as a sequence. @ s indicates that fields are not split. ." # "Indicates the row number, and" \ "indicates the division. Finally, create a new sequence table based on the results of each group ,~ (1) indicates the 1st members of the current group. The function array can split the string into sequences and the results are as follows:

If the file is too large to be stored in the memory, open the file with a cursor and calculate it in batches. First, create sub. dfx to read a batch of data and return it when there is an external request. The Code is as follows:

  A B
1 = File ("D: \ data.txt"). cursor @ si ()
2 For A1. 3000 = A2.group (#-1) \ 3)
3   = B2.new (~ (1): OrderID, (line = ~ (2). array ("\ t") (1): Client, line (2): SellerId, line (3): Amount ,~ (3): OrderDate)
4   Result B3

Loop A1: Read 3000 pieces of data each time and process the data according to the previous algorithm.

B4 indicates that B3 is returned to the main script. The main script (the dfx file called by JAVA) code is as follows:

  A
1 = Pcursor ("sub. dfx ")

The pcursor function can request data from sub. dfx and convert it to cursor output.

Variable row record

Each record in data.txt is an indefinite number of rows, but each field has its fixed mark, which is "Object Type:" and "left: "," top "," Line Color: "until the end of the Line text, 1st records: Symbol1, 14, 11, RGB (1 0 0 ). Now we want to read it as a structured two-dimensional table.

Code of the Set calculator:

  A
1 =File(“data.txt "). read ()
2 = A1.array ("Object Type:"). to (2 ,)
3 = A2.new (~. Array ("\ r \ n") (1): OType, mid (~, S = pos (~," Left: ") + len (" left: "), pos (~," \ R \ n ", s)-s): L, mid (~, S = pos (~," Top: ") + len (" top: "), pos (~," \ R \ n ", s)-s): T, mid (~, S = pos (~," Line Color: ") + len (" Line Color: "), if (r = pos (~," \ R \ n ", s), r, len (~)) -S + 1): LColor)

 

The Read function can Read an object as a large string. Split the string with a separator to remove the first blank line. Finally, create a new sequence table and use the string functions array, pos, len, and mid to find the required fields. Note that the last line may not have a carriage return line break, so if judgment is required. Final result:

A string function is used to search for a field. You can also use a regular expression.

If the file is too large for memory, you can use the pcursor function to read the file in batches.

Record groups by tag

The data.txt file stores records by group. The group names (such as ARO, BDR, and BSF) marked by list must be combined to output group names and group fields. The source data is as follows:

Code of the Set calculator:

  A
1 Export file(“mutiline2.txt "). import @ si ()
2 = A1.group @ I (like (~," List :*"))
3 = A2.conj (~. To (2,). new (mid (a2 .~ (1), 6): Client, (t = ~. Array ("\ t") (1): c1, t (2): c2, t (3): c3, t (4): c4 ))

First, the file is read as a string sequence, and then the tag group is separated by records. @ I indicates that the condition is true and is divided into a new group, * is a wildcard. A2 is as follows:

Then, retrieve the fields by serial number and merge the records of each group. The result is as follows:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.