Three ways to read Excel in C # and compare analytics

Source: Internet
Author: User
Tags rowcount
(1) OLE DB mode

Advantage: Excel directly as a data source processing, directly read the content through SQL, read faster.

Disadvantage: Reading data is not flexible enough to read a cell directly, only the entire sheet page is read (the result is a DataTable) and then in the DataTable based on the number of rows to get the specified value.

When the amount of Excel data is large. is very memory intensive and throws a memory overflow exception when there is not enough memory.

Read the code as follows:

 Public DataTable getexceltablebyoledb (string strexcelpath, String tableName) {try {datatable dtexcel = new Datatabl    E ();    Data table DataSet ds = new DataSet ();    Gets the file name extension string strextension = System.IO.Path.GetExtension (Strexcelpath);    String strFileName = System.IO.Path.GetFileName (Strexcelpath);    Excel's connection OleDbConnection objconn = null; Switch (strextension) {case ". xls": objconn = new OleDbConnection ("Provider=Microsoft.Jet.OLEDB.4.0;Data S Ource= "+ Strexcelpath +"; "+" Extended properties=\ "Excel 8.0;        Hdr=no;imex=1;\ "");      Break  Case ". xlsx": objconn = new OleDbConnection ("Provider=microsoft.ace.oledb.12.0;data source=" + Strexcelpath + ";" + "Extended properties=\" Excel 12.0;        Hdr=no;imex=1;\ "");      Break        Default:objconn = null;    Break    } if (objconn = = null) {return null;    } objconn.open (); Get information for all sheet tables in Excel//system.data.datatable schematable = Objconn.getoledbschematabLe (System.Data.OleDb.OleDbSchemaGuid.Tables, NULL); Gets the first sheet table name for Excel//string tableName = schematable.rows[0][2]. ToString ().    Trim ();    String strSQL = "SELECT * FROM [" + TableName + "]";    Gets the information in Excel specified sheet table OleDbCommand objcmd = new OleDbCommand (strSQL, objconn);    OleDbDataAdapter myData = new OleDbDataAdapter (strSQL, objconn);    Mydata.fill (ds, TableName);//Fill Data objconn.close (); Dtexcel is the information stored in the specified table in the Excel file Dtexcel = ds.    Tables[tablename];  return dtexcel;  } catch {return null; }}

The following is a description of the connection string

Hdr=yes, this means that the first row is the title, not for the data use (but in my actual use, if the first row has a complex number, then the Read DataTable column header is automatically set to F1, F2, and so on the naming, and the actual application does not match, so it was through the hdr= No method reads all content into the DataTable and then manually sets the first line to the header); IMEX (IMport EXport mode) settings
There are three modes of IMEX:
0 is Export mode
1 is Import mode
2 is Linked mode (full update capabilities)
What I want to highlight here is the IMEX parameter, because different patterns represent different reading and writing behaviors:
When imex=0 is "Export mode", the Excel file opened by this mode can only be used for "write" purposes.
When Imex=1 is "Import Mode", the Excel file opened in this mode can only be used for "read" purposes.
When imex=2 is "link mode", the Excel file opened in this mode can support both "read" and "write" purposes.

---------------------------------

Also, when reading a Excel2007 version of a file, the version should be changed from 8.0 to 12.0, while the driver can no longer use jet, but should use aces. Responsible for errors that could cause "installable ISAM not Found".

---------------------------------

On the net also found that in this way there is more than the number of sheet tables in the actual Excel table sheet The number of cases, for two reasons:

1. The name removed includes the name in the XL naming manager (see XL2007 's formula-naming manager, shortcut key crtl+f3);

2. The name removed includes the filterdatabase suffix, which is the XL used to record the filter range.

For the 1th, it is easy to delete the contents of the existing naming manager, the 2nd is cumbersome to deal with, the filter after the deletion of these names remain, it is simple to add sheet and then the original sheet copy in. However, the actual situation does not make the above check for each Excel. The filter scheme is given below. (We have verified this problem, let us verify it ourselves)

Objconn is a link to read Excel, the following filter to get a valid sheet page name collection System.Data.DataTable schematable = objconn.getoledbschematable ( System.Data.OleDb.OleDbSchemaGuid.Tables, NULL); list<string> lstsheetnames = new list<string> (); for (int i = 0; i < SchemaTable.Rows.Count; i++) {   String strsheetname = (string) dtsheetname.rows[i]["table_name"];   if (Strsheetname.contains ("$") &&!strsheetname.replace ("'", ""). EndsWith ("$"))   {     //filter Invalid SheetName complete ....     Continue;   }   if (lstsheetnames! = null &&!lstsheetnames.contains (strsheetname))     Lstsheetnames.add (strsheetname);}

Because read out invalid SheetName general case the last character will not be $. If SheetName has some special symbols, the sheetname that are read will automatically be quoted as single quotes. For example, in Excel to edit sheetname into MySheet (1), at this time read out the SheetName is: ' MySheet (1) $ ', so the last character is not the best to filter the single quotation mark.

---------------------------------

(2) How COM components are implemented (by adding Microsoft.Office.Interop.Excel references)

Advantage: The ability to read data in Excel is very flexible, and the user can flexibly call various functions for processing.

Disadvantage: Cell-based processing, read slower, and for large data volumes is best not to use this way to read.

You need to add the appropriate DLL reference, which must exist before you can use it, and if the Web site is deployed on IIS, you also need to have Excel installed on the server, and sometimes you need to configure IIS permissions.

Read the code as follows:

Private Stopwatch Wath = new Stopwatch ();///<summary>///use COM to read excel///</summary>///<param name= " Excelfilepath "> Path </param>///<returns>datatabel</returns>public System.Data.DataTable  Getexceldata (String excelfilepath) {Excel.Application app = new Excel.Application ();  Excel.Sheets Sheets;  Excel.Workbook Workbook = null;  Object omissiong = System.Reflection.Missing.Value;  System.Data.DataTable dt = new System.Data.DataTable (); Wath.  Start ();    try {if (app = = null) {return null; } workbook = App.  Workbooks.Open (Excelfilepath, Omissiong, Omissiong, Omissiong, Omissiong, Omissiong, Omissiong, OMissiong, OMissiong,    Omissiong, Omissiong, Omissiong, Omissiong, Omissiong, Omissiong); Read the data into the datatable--start sheets = workbook.    worksheets;    Excel.Worksheet Worksheet = (excel.worksheet) sheets.get_item (1);//Read the first table if (Worksheet = = null) return null;    String cellcontent; int irowcount = worksheet. usedrange.roWs.    Count; int icolcount = worksheet.    UsedRange.Columns.Count;    Excel.Range Range;    Responsible for column Head Start DataColumn DC;    int ColumnID = 1; Range = (excel.range) worksheet.    Cells[1, 1]; while (range. Text.tostring ().      Trim ()! = "") {dc = new DataColumn (); dc.      DataType = System.Type.GetType ("System.String"); dc. ColumnName = range. Text.tostring ().      Trim (); Dt.       Columns.Add (DC); Range = (excel.range) worksheet.    Cells[1, ++columnid]; }//end for (int iRow = 2; iRow <= Irowcount; irow++) {DataRow dr = dt.      NewRow (); for (int icol = 1; icol <= icolcount; icol++) {range = (excel.range) worksheet.        Cells[irow, Icol]; Cellcontent = (range. Value2 = = null)? "": Range.          Text.tostring ();      DR[ICOL-1] = cellcontent; } dt.    Rows.Add (DR); } wath.    Stop (); TimeSpan ts = Wath.    Elapsed;  Read the data into the DataTable--end return DT;  } catch {return null; } finally {workbook. Close (False, Omissiong, Omissiong);    System.Runtime.InteropServices.Marshal.ReleaseComObject (workbook);    workbook = null; App.    Workbooks.close (); App.    Quit ();    System.Runtime.InteropServices.Marshal.ReleaseComObject (APP);    App = null; Gc.    Collect (); Gc.  WaitForPendingFinalizers (); }}///<summary>///using COM, multithreading to read Excel (1 main thread, 4 sub-threads)//</summary>///<param name= "Excelfilepath" > Path </param>///<returns>datatabel</returns>public System.Data.DataTable Threadreadexcel (string  Excelfilepath) {Excel.Application app = new Excel.Application ();  Excel.Sheets Sheets = null;  Excel.Workbook Workbook = null;  Object omissiong = System.Reflection.Missing.Value;  System.Data.DataTable dt = new System.Data.DataTable (); Wath.  Start ();    try {if (app = = null) {return null; } workbook = App.  Workbooks.Open (Excelfilepath, Omissiong, Omissiong, Omissiong, Omissiong, Omissiong, Omissiong, OMissiong, OMissiong, Omissiong, Omissiong, Omissiong, Omissiong, Omissiong, OmissIong); Read the data into the datatable--start sheets = workbook.    worksheets;    Excel.Worksheet Worksheet = (excel.worksheet) sheets.get_item (1);//Read the first table if (Worksheet = = null) return null;    String cellcontent; int irowcount = worksheet.    UsedRange.Rows.Count; int icolcount = worksheet.    UsedRange.Columns.Count;    Excel.Range Range;    Responsible for column Head Start DataColumn DC;    int ColumnID = 1; Range = (excel.range) worksheet.    Cells[1, 1];      while (Icolcount >= ColumnID) {dc = new DataColumn (); dc.      DataType = System.Type.GetType ("System.String"); String strnewcolumnname = range. Text.tostring ().      Trim ();      if (Strnewcolumnname.length = = 0) Strnewcolumnname = "_1"; Determines whether the column name repeats for (int i = 1; i < ColumnID; i++) {if (dt). COLUMNS[I-1].      ColumnName = = strnewcolumnname) Strnewcolumnname = strnewcolumnname + "_1"; } DC.      ColumnName = Strnewcolumnname; Dt.      Columns.Add (DC); Range = (excel.range) worksheet. Cells[1, ++columnid];    }//end//Data greater than 500, using multi-process to read data if (IRowCount-1 > 500) {//Start multithreading read data//new thread int b2 = (i      ROWCOUNT-1)/10;      DataTable dt1 = new DataTable ("Dt1"); DT1 = dt.      Clone ();      Sheetoptions sheet1thread = new sheetoptions (worksheet, Icolcount, 2, B2 + 1, DT1); Thread othread1 = new Thread (new ThreadStart (Sheet1thread.      sheettodatatable)); Othread1.      Start ();      Blocks 1 milliseconds, guaranteeing the first read of DT1 thread.sleep (1);      DataTable DT2 = new DataTable ("DT2"); DT2 = dt.      Clone ();      Sheetoptions sheet2thread = new sheetoptions (worksheet, Icolcount, b2 + 2, B2 * 2 + 1, DT2); Thread othread2 = new Thread (new ThreadStart (Sheet2thread.      sheettodatatable)); Othread2.      Start ();      DataTable DT3 = new DataTable ("DT3"); DT3 = dt.      Clone ();      Sheetoptions sheet3thread = new sheetoptions (worksheet, Icolcount, B2 * 2 + 2, B2 * 3 + 1, DT3); Thread othread3 = new Thread (new ThreadStart (Sheet3thread.      sheettodatatable)); Othread3.     Start (); DataTable dt4 = new DataTable ("Dt4"); DT4 = dt.      Clone ();      Sheetoptions sheet4thread = new sheetoptions (worksheet, Icolcount, B2 * 3 + 2, B2 * 4 + 1, DT4); Thread othread4 = new Thread (new ThreadStart (Sheet4thread.      sheettodatatable)); Othread4.      Start (); The main thread reads the remaining data for (int iRow = B2 * 4 + 2; iRow <= Irowcount; irow++) {DataRow dr = dt.        NewRow (); for (int icol = 1; icol <= icolcount; icol++) {range = (excel.range) worksheet.          Cells[irow, Icol]; Cellcontent = (range. Value2 = = null)? "": Range.          Text.tostring ();        DR[ICOL-1] = cellcontent; } dt.      Rows.Add (DR); } othread1.      Join (); Othread2.      Join (); Othread3.      Join (); Othread4.      Join (); Append data read from multiple threads to Dt1 after foreach (DataRow dr in Dt. Rows) dt1. Rows.Add (Dr.      ItemArray); Dt.      Clear (); Dt.      Dispose (); foreach (DataRow Dr in DT2. Rows) dt1. Rows.Add (Dr.      ItemArray); DT2.      Clear (); DT2. Dispose(); foreach (DataRow Dr in Dt3. Rows) dt1. Rows.Add (Dr.      ItemArray); DT3.      Clear (); DT3.      Dispose (); foreach (DataRow Dr in Dt4. Rows) dt1. Rows.Add (Dr.      ItemArray); Btr.      Clear (); Btr.      Dispose ();    return DT1; } else {for (int iRow = 2; iRow <= Irowcount; irow++) {DataRow dr = dt.        NewRow (); for (int icol = 1; icol <= icolcount; icol++) {range = (excel.range) worksheet.          Cells[irow, Icol]; Cellcontent = (range. Value2 = = null)? "": Range.          Text.tostring ();        DR[ICOL-1] = cellcontent; } dt.      Rows.Add (DR); }} wath.    Stop (); TimeSpan ts = Wath.    Elapsed;  Read the data into the DataTable--end return DT;  } catch {return null; } finally {workbook.    Close (False, Omissiong, Omissiong);    System.Runtime.InteropServices.Marshal.ReleaseComObject (workbook);    System.Runtime.InteropServices.Marshal.ReleaseComObject (sheets);    workbook = null; App. Workbooks.close ();    App.    Quit ();    System.Runtime.InteropServices.Marshal.ReleaseComObject (APP);    App = null; Gc.    Collect (); Gc.  WaitForPendingFinalizers (); }}

(3) Npoi way to read Excel (this method has not been tested)

Npoi is the. NET version of the POI project. Poi is an open source Java program that reads and writes Microsoft OLE2 component documents such as Excel, Word, and so on. With Npoi, you can read and write Word/excel documents on a machine that does not have Office installed or the appropriate environment.

Advantages: Read Excel faster, read mode operation flexibility

Cons: You need to download the appropriate plugin and add it to the system reference.

<summary>///importing data from Excel into a DataTable///</summary>///<param name= "SheetName" > The name of the Excel workbook sheet </param>///<param name= "Isfirstrowcolumn" > whether the first row is a DataTable column name </param>///<  Returns> returns the Datatable</returns>public DataTable exceltodatatable (string sheetname, bool Isfirstrowcolumn) {  Isheet sheet = null;  DataTable data = new DataTable ();  int startrow = 0;    try {fs = new FileStream (FileName, FileMode.Open, FileAccess.Read);    if (Filename.indexof (". xlsx") > 0)//2007 Version workbook = new Xssfworkbook (FS);    else if (Filename.indexof (". xls") > 0)//2003 Version workbook = new Hssfworkbook (FS); if (sheetname! = null) {sheet = workbook.    Getsheet (SheetName); } else {sheet = workbook.    Getsheetat (0); if (sheet! = null) {IRow firstrow = sheet.      GetRow (0); int cellcount = Firstrow.lastcellnum; The number of the last cell in a row is the total number of columns if (Isfirstrowcolumn) {for (int i = firstrow.firstcellnum; I < Cellcount; ++i) {DataColumn column = new DataColumn (Firstrow.getcell (i).          Stringcellvalue); Data.        Columns.Add (column); } StartRow = Sheet.      Firstrownum + 1; } else {StartRow = sheet.      Firstrownum; }//The label of the last column int rowCount = sheet.      Lastrownum; for (int i = startrow; I <= rowCount; ++i) {IRow row = sheet.        GetRow (i); if (row = = null) continue; Rows with no data default is a null DataRow datarow = data.        NewRow (); for (int j = row. Firstcellnum; J < Cellcount; ++J) {if (row. Getcell (j)! = NULL)///Similarly, cells with no data default to null Datarow[j] = row. Getcell (j).        ToString (); } data.      Rows.Add (DataRow);  }} return data; } catch (Exception ex) {Console.WriteLine ("Exception:" + ex.    Message);  return null; }}
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.