OLDB Solution for reading Excel data type mismatch (ZT)

Source: Internet
Author: User
Tags mixed ole readline string format

1 Introduction
In the design of your application, you often need to read Excel data or import Excel data into other data carriers, such as importing Excel data into a database such as SQL Sever for use in an application. The author in the development of "Automotive industry chain ASP Collaborative business platform" encountered a similar demand. A vehicle production enterprise needs to publish its vehicle departure information to the automobile industry chain platform, the data for the internal ERP system generated Excel data table, the user first upload the data sheet to the automotive industry chain platform, the platform to import this Excel data read into the platform SQL Sever database, For use by other applications. The development of the automobile industry chain platform is vs.net, the language used is C #, and in the process of development, it is found that using microsoft.jet.oledb.4.0 to read data will result in the loss of a certain type of data when there are mixed data containing text and numbers in a field. This paper analyzes the root cause of this problem and gives the corresponding solution.
2 Problem Description
Excel is Microsoft's spreadsheet processing software, in the modern Office and enterprise information applications are very extensive, because of this, in programming we often have to access the Excel file to obtain data, but Excel file is not a standard database [1].
Asp. NET is also a product of Microsoft Corporation, which is used primarily for web design as an important part of the. NET Framework framework. We're in. NET access to Excel data is generally used when reading microsoft.jet.oledb.4.0[2]. Now to read an Excel file Auto.xls the Sheet1 worksheet, for example, the contents of the worksheet are shown in table 1.
Table 1 Data contents of the Sheet1 table
The data contents of the table are now read and displayed into the DataGrid, and the simplified code is as follows:
String connstr = "Provider = microsoft.jet.oledb.4.0; Datasource=c:/auto.xls; Extended properties= ' Excel 8.0; Hdr=yes '; ";
OleDbConnection conn=new OleDbConnection (CONNSTR);
Conn.Open ();
String Sql= "select * from [sheet1$]";
OleDbDataAdapter da=new OleDbDataAdapter (SQL,CONNSTR);
DataSet ds=new DataSet ();
Da. Fill (DS);
Datagrid1.datasource=ds;
Datagrid1.databind ();
Conn.close ();
However, the result of running the above code is not expected, it will appear as shown in table 2. You can find that the two data items in the first field that are "1042" become empty.
Table 2 The data content shown in DATAGRID1
A program designer has made the following changes to the Extended Properties of the above code OleDbConnection connection string Extended properties= ' Excel 8.0; Hdr=no;imex=1 ', think this problem can be resolved. As a result of similar problems encountered in the development of "automobile industry chain collaborative business platform", a lot of tests have been done to find that the problem has not been solved in essence after adding imex=1. Performance: If all the first 8 records in a field are all pure numbers, then entries with letters or kanji in subsequent records in the field will still be empty, but if one of the first 8 records in the field is not a pure number, the desired result will be obtained.

3 Problem Analysis
The source of this problem is related to the limitations of Excel Isam[3] (Indexed sequential access method, the indexed sequential access methods) driver, and the Excel ISAM driver determines the type of an Excel column by examining the actual value in the first few rows , and select the data type that represents most of the values in its sample [4]. That is, the Excel ISAM finds the first few rows of a column (8 rows by default), and takes as many types as its processing type. For example, if the number is more, then other items containing text such as letters will be empty, whereas if the text is mostly, the data items of the pure numbers will be empty.
The meaning of HDR and IMEX represented in the 1th section of the program code extended Properties item is now analyzed concretely. HDR is used to set whether the first row in the Excel table is the field name, "Yes" means yes, "no" means the data content; Imex is the mode used to tell the driver to use the Excel file, with values of 0, 1, and 23 representing the export, import, and blending modes respectively. When we set Imex=1 to force mixed data to be converted to text, but this setting is not reliable, imex=1 only makes sure that the first 8 rows of data in a column are at least one of the text items, only to make a slight change in the behavior of finding the best selection of data types in the first 8 rows of data. For example, the first 8 rows of data in a column are all pure numbers, so it still has a numeric type as the data type of the column, and then the data containing the text in the row remains empty.
Another improvement is that Imex=1 is used with registry value TypeGuessRows, and the TypeGuessRows value determines the data type determined by the ISAM driver from several previous data samples, which defaults to "8". You can change the number of sample rows by modifying the registry value under Hkey_local_machine\software\microsoft\jet\4.0\engines\excel. However, this improvement does not fundamentally solve the problem, even if we set the Imex to "1", typeguessrows set to a larger, such as 1000, assuming that the data table has 1001 rows, a column of the first 1000 rows are all pure numbers, the column of the 1001th row is a text, The ISAM-driven mechanism still makes this column of data empty.

4 Workaround
From the analysis above, it is known that when a column of data contains mixed types, it is unavoidable to use microsoft.jet.oledb.4.0 to read Excel files in. NET, and to solve this problem we can only consider other data reading methods.
Another way to read Excel files in. NET is to go back to using traditional COM components, which are covered in a number of technical articles or papers, and this article does not repeat. It should be noted that the use of COM components to read Excel file data is inefficient, when released may encounter unpredictable errors, especially the development of Web applications should be used with caution.

This article presents another way to solve this problem by reading the CSV plain text format.
(1) Before reading the text data of the. xls type of Excel, convert it to. csv format, and save it directly in Excel as this format to achieve the purpose of the conversion. A CSV file, also known as a comma-delimited file, is a plain text file that separates data columns with a ",", and the data tables in table 1 are stored in CSV format and opened with a plain text editor as shown in table 3.
Table 3 table 1 data saved in CSV format
It is important to note that the CSV file can also be read in OLE DB or ODBC, but the ISAM mechanism also works if you read its data in these ways and return to the old path of lost data.
(2) The ordinary way to read the text file to open the file, read the first line, with "," as a delimiter to obtain the field names, in the DataTable to create the corresponding fields, the type of the field can be uniformly created as "String".

In the original
(3) Read rows of data line by line, use "," as delimiters to get the data for each column of a row and fill in the corresponding fields of the DataTable.
The simplified code for implementation is as follows:
String Line;
String [] split = null;
DataTable table=new DataTable ("Auto");
DataRow Row=null;
StreamReader sr=new StreamReader ("C:/auto.csv", System.Text.Encoding.Default);
To create a data column corresponding to a data source
line = Sr. ReadLine ();
Split=line. Split (', ');
foreach (String colname in Split) {
Table. Columns.Add (Colname,system.type.gettype ("System.String")); }
Filling data into a data table
int j=0;
while ((LINE=SR. ReadLine ())!=null) {
j=0;
row = table. NewRow ();
Split=line. Split (', ');
foreach (String colname in Split) {
Row[j]=colname;
j + +;}
Table. Rows.Add (row);}
Sr. Close ();
Show data
Datagrid1.datasource=table. DefaultView;
Datagrid1.databind ();

5 Conclusion
In the design of the application, it is very common to have access to Excel data in this article in the. NET to explore ways to access Excel tables that contain mixed-type data. Of course, if there is no mixed type of data using Microsoft.Jet.OLEDB for better scenarios. For not used. NET development, the analysis of this paper and the methods provided can also be referenced.


OLE DB connection to Excel connection string Imex problem today encountered a problem need to write data in Excel table, tossing a long time to find out is Imex, so the record to remind yourself, also hope you don't make the same mistake.

Problem: Using the statement "INSERT INTO [sheet1$] (large Class) VALUES (' Test ')" cannot be inserted.

cause: Provider=Microsoft.Jet.OLEDB.4.0;Data source= ' 2008-08.xls '; Extended properties= ' Excel 8.0; Hdr=yes;imex=1 '

Workaround: Remove Imex=1

Supplement:
When you insert data into Excel, the data type is selected by the data type in the first 8 rows of data, for example: fractions of a column before the first 8 behavior null insert 5 for string format, if the first 8 behavior number format is inserted 5 for the number format information about IMEX:

IMEX is the mode used to tell the driver to use Excel files, with values of 0, 1, and 23 representing export, import, and blending modes. When we set Imex=1 to force mixed data to be converted to text, but this setting is not reliable, imex=1 only makes sure that the first 8 rows of data in a column are at least one of the text items, only to make a slight change in the behavior of finding the best selection of data types in the first 8 rows of data. For example, the first 8 rows of data in a column are all pure numbers, so it still has a numeric type as the data type of the column, and then the data containing the text in the row remains empty.

Another improvement is that Imex=1 is used with registry value TypeGuessRows, and the TypeGuessRows value determines the data type determined by the ISAM driver from several previous data samples, which defaults to "8". You can change the number of sample rows by modifying the registry value under HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft \jet\4.0\engines\excel. However, this improvement does not fundamentally solve the problem, even if we set the Imex to "1", typeguessrows set to a larger, such as 1000, assuming that the data table has 1001 rows, a column of the first 1000 rows are all pure numbers, the column of the 1001th row is a text, The ISAM-driven mechanism still makes this column of data empty.

OLDB Solution for reading Excel data type mismatch (ZT)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.