We often need to read and process data in text files.Streamreader (. NET)/FileSystemObject (VB 6.0) read data row by row.
If we can read files and process data like a database table , We will find that the above method has some disadvantages. .
Some disadvantages are: :
1. Keep the connection environment. Lock the file until processing is complete.
2. each row needs to be separated to obtain data in some columns. it is difficult to process the data of some rows with commas. example: "Yes, comma is here", 2, "hello, another comma", 6
3. There is no data filtering option before data processing starts.
4. to count the number of records of a file or count the number of records containing a special type, you need to read the entire file.
We can list more disadvantages. Compared with row-by-row reading, database-based table queries are more valuable.
Can we read text files like reading a database table?
If the answer is no, it will not be discussed in this article. Article Yes, we can read files like reading database tables and easily overcome the disadvantages mentioned above.
It is relatively simple to connect to a database and query data from the table. In this case, we treat a text file as a table and the directory where the file is located as a database.
To read data, follow these steps:
1. Open the database connection
Note: The connection string here is very important. It varies with the file type to be read. We will discuss it later in this article.
2. Use the basic query language to obtain the result set.
3. traverse the result set to read fields.
C # code:
Dataset mydata = New Dataset ();
String Strfilepath = " C :\\ " ;
String Myselectquery = " Select * From samplefile. CSV " ;
Oledbconnection myconnection = New Oledbconnection ( " Provider = Microsoft. Jet. oledb.4.0; Data Source = D :\\; " +
" Extended properties = \ " Text; HDR = Yes; FMT = Delimited \ "" );
Oledbdataadapter dscmd = New Oledbdataadapter (myselectquery, myconnection );
// Fill the DataSet object
Dscmd. Fill (mydata, " Customerowners " );
// Create a XML document with the table data
Mydata. writexml ( " D: \ testxml. xml " );
Myconnection. Close ();
VB 6.0 code: ' Set the database connection
Objconnection. Open " Provider = Microsoft. Jet. oledb.4.0; " & _
" Data Source = " & Strfilepath & " ; " & _
" Extended properties = "" Text; HDR = yes; FMt = delimited """
'Query from the file assuming it as a database table
Objrecordset. Open"Select * from" &Filename ,_
Objconnection, adopenstatic, adlockoptimistic, ad1_text
'Loop through the records and insert in to the table
Do until objrecordset. EOF
Debug. Print objrecordset. Fields. Item ("Number")
Debug. Print objrecordset. Fields. Item ("Name")
We read text files like reading database tables. Let's take a look at the advantages of this method:
1. fast processing
2. First, describe how it processes and separates data rows. Assume that a column in your text contains a comma (the 2nd disadvantage listed above ),Program Will automatically handle this situation. We don't have to deal with this individual situation.
3. If the filtering criteria are specified, We can query the specified record rows.
4. Now, it seems obvious that the where, having, or group clause can be used in queries.
Ex: "select number, count (name) from samplefile1.csv group by number having count (name)> = 1"
5. we can filter data based on conditions to obtain all the data that meets the conditions in the file at a time. you can use the rowfilter/filter attribute or dataview/recordse. as I have said, once we read data into dataset/recordset, we can use them to complete all operations. including filtering and sorting ..
Other file formats
We have discussed how to read files separated by commas (,). How can we use this method to process tab-Separated Files and fixed-length files? The key is the connection string we provide.
Let's take a look at the specific content of the connection string: Provider = Microsoft. Jet. oledb. 4.0 ; Data Source = C: \ Foldername; extended Properties = " Text; HDR = yes; FMt = delimited"
Provider-indicates the database type. We use the oledb connection type.
Data source-the folder is treated as a database.
Extended properties-these attributes indicate the method in which the file will be read.
The first part of the string defines the file type. although file formats can be classified into comma-separated, tab-separated, or fixed-length data, they are all simple text. if you want to read an Excel file, use Excel X. x, X. X is the Excel version number.
- HDR (header)-used to specify whether the header (File Header) is available. Yes-the first line of the file isCompositionHeader. The other rows are data. No-the data starts from the first row. FMT (Format)-specifies the format type. It can be the following values:
||Files whose data is separated by commas. comma is the default delimiter.
||Files whose data is separated by 'x.
||Files whose data is separated by tabs.
||Read data with a specified length. You can use col1, col2, and so on to specify the column length and type. More at msdn.
If the specified format is 'delimited', the default value is comma (,), which is stored in the registry. you can modify HKEY_LOCAL_MACHINE \ SOFTWARE \ Microsoft \ jet \ 4.0 \ engines \ Text \ format to change the default setting.
Do not modify the registry at will. this Ms provides another optional method: provide us a schema. INI file. if we read a fixed-length file, we should use schema. INI file. in schema. the length of the specified field in ini.
Read multiple files
If you want to merge and filter multiple files and data in the same column, you can process them like in SQL. we can connect tables to obtain the merged data. note that the output data is the rows of all files in cross join. make sure that the data you filter is based on the same column.
Samplefile1.csv-(empid, name, address)
Samplefile2.csv-(empid, salary, month) // Where clause is used to get 'natural join'
String Myselectquery = " Select * From samplefile. CSV as sample1, samplefile2.csv as sample2 " +
" Where sample1.number = sample2.number " ;
' Where clause is used to get 'natural join'
Objrecordset. Open " Select * From samplefile. CSV as sample1, samplefile2.csv as sample2 " & _
" Where sample1.number = sample2.number " ,_
Objconnection, adopenstatic, adlockoptimistic, ad1_text
Alias (sample1 is samplefile. CSV alias) is required in join queries, especially when the table has the same column name. in this column, the table name (File Name) contains a bit (.), therefore, the query parser will be misled by the dot in the table name, so that the 'csv. number is treated as the column name.
Original article:Read text file (txt, CSV, log, tab, fixed length)