Questions about loss of data when using Excel files on Aas. Net

Source: Internet
Author: User

Abstract: This paper studies the ISAM driver restrictions, analyzes the causes of data loss in. NET when Microsoft. Jet. OLEDB is used to read Excel files, and provides a solution to this problem.
Key words: Excel; data access; Ole DB; ISAM
Graph classification: TP 311.1 Document Identification Code: article A No.: 1009-3044 (2007) 04-10937-02
  
1 Introduction
In the design of applications, you often need to read Excel data or import and convert Excel data to other data carriers. For example, you can import Excel data into databases such as SQL Sever through applications for use. The author encountered similar requirements in the development of "ASP collaborative commerce platform for automobile industry chain. An automobile production enterprise needs to publish its vehicle departure information to the automobile industry chain platform. The data is an Excel data table generated by the internal ERP system. The user first uploads the data table to the automobile industry chain platform, the Platform imports the Excel data into the SQL Server database of the platform for other applications. The development tools used for the development of the automotive industry chain platform are. NET, the language used is C #. Microsoft. jet. when OLEDB.4.0 reads data, a type of data may be lost when a field contains both text and numbers. This article analyzes the root cause of this problem and provides corresponding solutions.
  
2 Problem Description
Excel is Microsoft's spreadsheet processing software, which is widely used in modern office and enterprise information applications, in programming, we often need to access Excel files to obtain data, but Excel files are not standard databases [1].
ASP. NET is also a Microsoft product. As an important component of the. NET FrameWork, it is mainly used for Web design. Microsoft. Jet. OLEDB.4.0 [2] is generally used for accessing and reading Excel Data in. NET. Take the sheet 1 worksheet in the Excel file auto.xls as an example. The content of the worksheet is shown in table 1.
Table 1 sheet1 table data content
The data in the table is read and displayed in the DataGrid. The simplified code is as follows:
String ConnStr = "Provider = Microsoft. Jet. OLEDB.4.0; DataSource = c:/auto.xls; Extended Properties = 'excel 8.0; HDR = YES ';";
OleDbConnection Conn = new OleDbConnection (ConnStr );
Conn. Open ();
String SQL = "select * from [sheet1 $]";
OleDbDataAdapter da = new OleDbDataAdapter (SQL, ConnStr );
DataSet ds = new DataSet ();
Da. Fill (ds );
DataGrid1.DataSource = ds;
DataGrid1.DataBind ();
Conn. Close ();
However, the result of running the above Code is not expected. It will display the content shown in table 2. It can be found that the two data items in the first field "1042" become null.
Table 2 data content displayed in the DataGrid1 table
Some programmers made the following changes to the Extended Properties in the above Code OleDbConnection connection string, Extended Properties = 'excel 8.0; HDR = NO; IMEX = 1 ', this problem can be solved. Due to a similar problem encountered in the development of the "automobile industry chain collaborative commerce platform", after a lot of tests, it was found that after the IMEX = 1 was added, this problem was not solved in essence. If all the first eight records of a field are pure numbers, the items that contain letters or Chinese characters in the subsequent records of this field will remain blank, however, if one of the first eight records of this field is not a pure number, the expected result is displayed.
  
3. Problem Analysis
The root cause of this problem is the limitation of the Excel ISAM [3] (Indexed Sequential Access Method, I .e. index Sequential Access Method) driver, the Excel ISAM driver checks the actual values in the first few rows to determine the type of an Excel column, and then selects the data type that can represent a large score in the sample [4]. That is, Excel ISAM searches for the first few rows of a column (eight rows by default) and takes the most types as the processing type. For example, if the number is large, other data items containing letters and other texts are left blank. If the number is large, data items with only numbers are left blank.
This section analyzes the meanings of HDR and IMEX in section 1st program code Extended Properties. HDR is used to set whether to use the first row of the Excel table as the field name. "YES" indicates that "NO" indicates that it is not the data content; IMEX is used to tell the driver to use the Excel file mode. Its values include 0, 1, and 2, which respectively represent the export, import, and hybrid modes. When we set IMEX = 1, the forced hybrid data is converted to text, but this setting is not reliable, IMEX = 1 only works when at least one of the first eight rows of data in a column is a text item, it only slightly changes the behavior of selecting the dominant data type in the first eight rows of data. For example, if the first eight rows of data in a column are all pure numbers, the Data Type of the column is still numeric, and the text data in the row is still null.
Another improvement measure is that IMEX = 1 is used with the registry value TypeGuessRows. The value of TypeGuessRows determines the Data Type of the ISAM driver from the previous data sampling. The default value is "8 ". You can change the number of sampled rows by modifying the registry value under "HKEY_LOCAL_MACHINE \ SOFTWARE \ Microsoft \ Jet \ 4.0 \ Engines \ Excel. However, this improvement still does not fundamentally solve the problem. Even if we set IMEX to "1" and TypeGuessRows to be larger, for example, 1000, suppose the data table has 1001 rows, the first 1000 rows of a column are all pure numbers, and the first 1,001st rows of the column are a text. The ISAM-driven mechanism still makes the column data empty.
  
4. Solution
From the above analysis, we can know that when a column of data contains a hybrid type. use Microsoft. NET. jet. OLEDB.4.0 reading Excel files may cause data loss. To solve this problem, you can only use other data reading methods.
Another way to read Excel files in. NET is to use the traditional COM component. This method is involved in many technical articles or papers and will not be described in detail in this article. It should be noted that the efficiency of using COM components to read Excel file data is low, and unexpected errors may occur during release. In particular, you should exercise caution when developing Web applications.

This article proposes another method to solve this problem by reading CSV plain text format.
(1) before reading the Excel file, convert it to .csv format and save it in Excel to convert it. A csv file, also known as a comma-separated file, is a plain text file that separates data columns with commas, the data tables in table 1 in this article are stored in CSV format and displayed in a plain text editor, as shown in table 3.
Table 3 Table 1 data saved in CSV format
It should be noted that CSV files can also be read in Ole DB or ODBC mode, but if these methods are used to read their data, it will return to the old road of data loss, the ISAM mechanism also plays a role.
(2) open the file using the common method of reading text files, read the first line, use "," as the separator to obtain the field names, and create corresponding fields in the DataTable, the field types can be uniformly created as "String ".
  
Original article
(3) read data rows row by row and use "," as the separator to obtain the data of each column of a row and fill in the corresponding fields of the DataTable.
The simplified code is as follows:
String line;
String [] split = null;
DataTable table = new DataTable ("auto ");
DataRow row = null;
StreamReader sr = new StreamReader ("c:/auto.csv", System. Text. Encoding. Default );
// Create a data column corresponding to the data source
Line = sr. ReadLine ();
Split = line. Split (',');
Foreach (String colname in split ){
Table. Columns. Add (colname, System. Type. GetType ("System. String "));}
// Enter the data in the data table
Int j = 0;
While (line = sr. ReadLine ())! = Null ){
J = 0;
Row = table. NewRow ();
Split = line. Split (',');
Foreach (String colname in split ){
Row [j] = colname;
J ++ ;}
Table. Rows. Add (row );}
Sr. Close ();
// Display data
DataGrid1.DataSource = table. DefaultView;
DataGrid1.DataBind ();
  
5 conclusion
In the design of applications, it is common to access Excel data. In this paper, we will discuss the methods used to access Excel tables containing mixed data in. NET. Of course, it is better to use Microsoft. Jet. OLEDB if there is no hybrid data. If it is not developed using. NET, the analysis and methods provided in this paper can also be referenced.
References:
[1] Linuxmine. use ASP. NET to access the Excel document [EB/OL]. http://www.linuxmine.com/77726.html,2007-1-22.
[2] Liu Hongcheng. C # advanced programming [M]. Beijing: Tsinghua University Press, 2003. 187-200.
[3] Xiao zhenghong, Cao Yuanda, Han qiufeng. data Access Technology-Comparison of DAO, ADO, and RDO [J]. computer and Information Technology, 2001, (1): 31-32.
[4] PBR. Excel use DAO OpenRecordset NULL as return value [EB/OL]. http://support.microsoft.com/kb/194124,2004-6-24.
For the charts, annotations, and formulas involved in this article, please read the original text in PDF format.

Comment:
#1 floor [main poster] | Weiwei frontier

Ado. Net reading Excel FAQ summary category:. NET technology bit by bit
It is often necessary to export data between the database and Execl. In the net Era, ADO. NET can use Microsoft. Jet. OleDb to access Excel. There are already many similar resources on the Internet. The most typical and simplest possibilities are as follows: (asp.net environment)

// Connection string
String xlsPath = Server. MapPath ("~ /App_data/somefile.xls "); // absolute physical path
String connStr = "Provider = Microsoft. Jet. OLEDB.4.0 ;"
"Extended Properties = Excel 8.0 ;"
"Data source =" xlsPath;
// Query statement
String SQL = "SELECT * FROM [Sheet1 $]";

DataSet ds = new DataSet ();
OleDbDataAdapter da = new OleDbDataAdapter (SQL, connStr );
Da. Fill (ds); // Fill in DataSet

// Operate the data in DataSet

// Output, bind data
GridView1.DataSource = ds. Tables [0];
GridView1.DataBind ();
Is it easy ?! Everything is like operating a database, but note the following:
1. When the data provider uses Jet, you must specify the Extended Properties keyword to set specific Excel Properties. Different Versions of Excel correspond to different attribute values: the valid Excel version for the Extended Properties value.
For Microsoft Excel 8.0 (97), 9.0 (2000), and 10.0 (2002) workbooks, use Excel 8.0.

For Microsoft Excel 5.0 and 7.0 (95) workbooks, use Excel 5.0.

For Microsoft Excel 4.0 workbooks, use Excel 4.0.

For Microsoft Excel 3.0 workbooks, use Excel 3.0.

Ref: http://msdn.microsoft.com/library/chs/default.asp? Url =/library/CHS/dv_vbcode/html/vbtskcodeexamplereadingexceldata1_dataset. asp "target =" _ new "> http://msdn.microsoft.com/library/chs/default.asp? Url =/library/CHS/dv_vbcode/html/vbtskcodeexamplereadingexceldata1_dataset. asp

2. The Data Source Path uses the physical absolute path (same as Access)

3. How do I reference a table name?
A valid reference to a table (or range) in an Excel Workbook.
To reference the fully used worksheet range, specify the worksheet name followed by the dollar sign. For example:

Select * from [Sheet1 $]
To reference a specific address range on a worksheet, specify the worksheet name that follows the dollar sign and the range. For example:

Select * from [Sheet1 $ A1: B10]
To reference a specified range, use the name of the range. For example:

Select * from [MyNamedRange]
Ref: http://msdn.microsoft.com/library/chs/default.asp? Url =/library/CHS/dv_vbcode/html/vbtskcodeexamplereadingexceldata1_dataset. asp "target =" _ new "> http://msdn.microsoft.com/library/chs/default.asp? Url =/library/CHS/dv_vbcode/html/vbtskcodeexamplereadingexceldata1_dataset. asp
Note:
You can reference three objects in an Excel Workbook:
• Full Worksheet: [Sheet1 $], and Sheet1 is the worksheet name
• Named cell area on a worksheet: [MyNamedRange] (you do not need to specify a worksheet because the entire xls naming area can only be unique)
XLS naming method: select the cell range, insert, name, and definition
• Untitled cell area on the worksheet: [Sheet1 $ A1: B10]
(Among various objects provided by relational databases (tables, views, stored procedures, etc.), Excel Data sources only provide objects equivalent to tables, it consists of the worksheet in the specified workbook and the defined naming area. The naming area is regarded as a "table", while the worksheet is regarded as a "system table ")

Note:
• [] (Square brackets) must be used. If not, the following message is displayed:
FROM clause syntax error
• It must be followed by $ (dollar sign); otherwise, it is reported as follows:
The Microsoft Jet Database Engine cannot find the object 'sheet2 '. Check whether the object exists and write out its name and path correctly.
• If the worksheet name is incorrect or does not exist, the following message is displayed:
'Sheet2 $ 'is not a valid name. Make sure it does not contain invalid characters or punctuation and the name is not too long.
• Mentioned in how to use ADO in Visual Basic or VBA to process Excel Data
~ And '(wavy lines and single quotes) instead of [], use ADO. NET test failed, report:
FROM clause syntax error
• When the worksheet name ([Sheet1 $]) is referenced, the data provider determines that the data table starts from a non-empty cell at the top left of the specified worksheet. For example, if the worksheet is empty from row 3rd, column C, row 3rd, column C, and all rows 1st and 2, only data starting from row 3rd and column C is displayed; end with a non-empty unit in the maximum range of the last table;
• To precisely read the range, use the namespace [NamedRange] or specify the address [Sheet1 $ A1: C10].

4. How do I reference a column name?
• Based on the default connection string, the data provider uses the first row in the valid region as the column name. If a cell in this row is empty, it is expressed by F1 and F2, where the ordinal number, consistent with the cell position, starting from 1;
• If you want the first row to be displayed as data rather than column names, you can specify: HDR = NO in the Extended Properties attribute of the connection string.
The default value is HDR = NO. format:

String connStr = "Provider = Microsoft. Jet. OLEDB.4.0 ;"
"Extended Properties =" "Excel 8.0; HDR = NO "";"
"Data source =" xlsPath;
Note: For Excel 8.0; HDR = NO, double quotation marks are required (here, the backslashes are escape characters in C)

Ref: ms-help: // MS. VSCC. v80/MS. MSDN. v80/MS. visual Studio. v80.chs/WD_ADONET/html/745c5f95-2f02-4674-b378-6d51a7ec2490.htm in "connect to Excel" section (Note: In my own MSDN, it uses two double quotation marks as an error. The test failed, the original Article says:

Note: Double quotation marks required for Extended Properties must also be added.
)

In this case, all column names start with F and then follow the index, starting with F1, F2, F3 .......

5. Why is the valid cell data not displayed?
The possible cause of this situation is that, in the default connection, the data provider infers the Data Types of subsequent units based on the previous cells.
You can specify IMEX = 1 in Extended Properties.

"IMEX = 1;" notifies the driver to always read the "hybrid" data column as text
Ref: Same as 4

PS: in the case of baidu, some netizens said that adding quotation marks to each unit is certainly a good solution, but the workload is huge, and it is not too busy, I am glad to find the cure prescription.

More ref:
How to use ADO in Visual Basic or VBA to process Excel Data
Http://dotnet.aspx.cc/ShowDetail.aspx? C673E2CD-3F1E-4919-8CE0-D69B894A0599

Note:
After adding the HDR or IMEX attribute values, many people on the Internet will report the error "unable to find the installable ISAM. The solution is simple:
The connection string is written as follows:

Provider = Microsoft. Jet. Oledb.4.0; Data Source = {0}; Extended Properties = 'excel 8.0; HDR = 1; IMEX = 1; '(note the single quotation marks marked in red)

Instead of writing it like this:

Provider = Microsoft. Jet. Oledb.4.0; Data Source = {0}; Extended Properties = Excel 8.0; HDR = 1; IMEX = 1;

 

 

 

From: http://www.douban.com/note/18510346/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.