Use Aspose. Words to extract table data from Word documents

Source: Internet
Author: User

For some projects, developers need to extract data from the Word documents and export the data to the database. The biggest challenge is to support existing Word documents.

There are thousands of Word documents with multiple data blocks in the same format. The document format is not designed to be read by another system. This means that there are no bookmarks, merge fields, and identify the actual data from standard instructions. Fortunately, all input fields are in the table, but these tables are also in different formats, some are single rows/cells, and others are changeable.

We can useAspose. WordsCreate and operate Word documents.

Create a similar table model in C # so that we can use it later when reading the document.

As shown below, you can see that the createdWordDocumentTableClass with three attributes:TableID,RowIDAndColumnIDAs mentioned earlier, we do not support TableID/RowIDs. These attributes only imply the location of the Word document. The START index is assumed to be 0.

public class WordDocumentTable{    public WordDocumentTable(int PiTableID)    {         MiTableID = PiTableID;    }    public WordDocumentTable(int PiTableID, int PiColumnID)    {         MiTableID = PiTableID;         MiColumnID = PiColumnID;    }    public WordDocumentTable(int PiTableID, int PiColumnID, int PiRowID)    {         MiTableID = PiTableID;         MiColumnID = PiColumnID;         MiRowID = PiRowID;    }    private int MiTableID = 0;    public int TableID    {         get { return MiTableID; }         set { MiTableID = value; }    }           private int MiRowID = 0;       public int RowID    {         get { return MiRowID; }         set { MiRowID = value; }    }    private int MiColumnID = 0;       public int ColumnID    {         get { return MiColumnID; }         set { MiColumnID = value; }    }}

Now we are at the extraction stage. As shown below, you will see the set of table cells that I want to read from the document.

private List<WordDocumentTable> WordDocumentTables{     get     {           List<WordDocumentTable> wordDocTable = new List<WordDocumentTable>();             //Reads the data from the first Table of the document.           wordDocTable.Add(new WordDocumentTable(0));             //Reads the data from the second table and its second column.        //This table has only one row.           wordDocTable.Add(new WordDocumentTable(1, 1));             //Reads the data from third table, second row and second cell.           wordDocTable.Add(new WordDocumentTable(2, 1, 1));         return wordDocTable;     }}

The following section extracts data from the Aspose. Words documents based on tables, rows, and cells.

public void ExtractTableData(byte[] PobjData){             using (MemoryStream LobjStream = new MemoryStream(PobjData))    {         Document LobjAsposeDocument = new Document(LobjStream);            foreach(WordDocumentTable wordDocTable in WordDocumentTables)         {              Aspose.Words.Tables.Table table = (Aspose.Words.Tables.Table)            LobjAsposeDocument.GetChild            (NodeType.Table, wordDocTable.TableID, true);              string cellData = table.Range.Text;            if (wordDocTable.ColumnID > 0)              {                   if (wordDocTable.RowID == 0)                   {                        NodeCollection LobjCells =                    table.GetChildNodes(NodeType.Cell, true);                        cellData = LobjCells[wordDocTable.ColumnID].ToTxt();                }                   else                   {                        NodeCollection LobjRows =                    table.GetChildNodes(NodeType.Row, true);                        cellData = ((Row)(LobjRows[wordDocTable.RowID])).                    Cells[wordDocTable.ColumnID].ToTxt();                   }              }            Console.WriteLine(String.Format("Data in Table {0},                    Row {1}, Column {2} : {3}",                                              wordDocTable.TableID,                                             wordDocTable.RowID,                                             wordDocTable.ColumnID,                                             cellData);                     }    }}


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.