Generate a WYSIWYG Excel report (2) -- from HTML table to excel Cell

Source: Internet
Author: User

In the previous articleArticleTo obtain the HTML table structure. In this article, we focus on the analysis of the table structure.

 

We are all familiar with the structure of HTML tables. How can we construct a structure on the other end so that excel can be well accepted and processed?

Intuitively, the content of a complete Excel file is composed of the content in each cell. Each cell has X and Y coordinates to indicate its position. That is to say, an Excel file is essentially a collection composed of many cells. Each cell uses the coordinate attribute to determine the location and the content attribute to store the content.

Based on this, I designed the most basic cell structure:
◆ X coordinate
◆ Y coordinate
◆ Merge Columns
◆ Merge rows
◆ Content

 

The most basic structure of Excel has been determined. The next step is to convert HTML table into an Excel Cell set.

Each TD node in the HTML table corresponds to an Excel cell, and its content does not need to be said. The combination of rows and columns can also be obtained from the rowspan and colspan attributes of TD, the key point of conversion is to locate the Excel cell location by the tr TD structure of the table, that is, the X and Y coordinates.

Y coordinate is easy to determine, that is, the number of lines in the tr where the TD is located. As for the X coordinate of a td, it is affected by two factors: a TR located at the same place as the TD, but located before it (reflected in the table vision, that is, its left TD) the placeholder condition of TD and the Cross-row condition of some TD in the tr where the TD is located.

Based on this consideration, the X coordinate of the positioning TD must go through two procedures: horizontal derivation (horizontal deduction) for processing the influence of the Left TD placeholder) and vertical deduction ).

 

The following table is used as an example to describe the two derivation processes.

 

Horizontal deduction)
A horizontal deduction (horizontal deduction) is limited to a TR range. The entire process is based on the principle of recursion. The recursive model is as follows:

 

CoreCodeIs:

1 Private   Int Horizontaldeduction (htmlnode phntd)
2 {
3 Htmlnode hnpreviussibling = Phntd. previussibling;
4 While (Hnpreviussibling ! =   Null   && Hnpreviussibling. Name ! = Phntd. Name)
5 {
6 Hnpreviussibling = Hnpreviussibling. previussibling;
7 }
8
9 If (Hnpreviussibling ! =   Null )
10 {
11 Int Ncolspan = Hnpreviussibling. getattributevalue ( " Colspan " , 1 );
12 Return Horizontaldeduction (hnpreviussibling) + Ncolspan;
13 }
14
15 Return   0 ;
16 }

 

After horizontal derivation, the X and Y coordinates of each TD are shown in:

 

 

Vertical deduction)
The process of a vertical derivation can be described as (currently, TD is represented by ):

 

Locate the TD Node B with the same X coordinate as a in row tr before.

If (B. rowspan> (A.Y-B.Y ))
{
X + = B. colspan, that is, the X coordinate of A is pushed back to the position of B. colspan.

At the same time, a TR is located at the same location as A, but the TD node behind it should push back the B. colspan displacement.
}

 

Execute such a process repeatedly on TD node A until it is determined that a does not need to move again.

 

The core code for vertical derivation is:

1 Bool Bactedpush =   False ;
2
3 Do
4 {
5 Int Ncompareditemindex =   - 1 ;
6 For ( Int J = I -   1 ; J > =   0 ; J -- )
7 {
8 If (Plstcells [J]. _ nstartx = Ocurrentcell. _ nstartx)
9 {
10 Ncompareditemindex = J;
11 Break ;
12 }
13 }
14
15 If (Ncompareditemindex > =   0 )
16 {
17 If (Plstcells [ncompareditemindex]. _ nrowspan > (Ocurrentcell. _ nstarty - Plstcells [ncompareditemindex]. _ nstarty ))
18 {
19 Ocurrentcell. _ nstartx + = Plstcells [ncompareditemindex]. _ ncolspan;
20
21 Bactedpush =   True ;
22
23 For ( Int K = I +   1 ; K < Plstcells. Count; k ++ )
24 {
25 If (Plstcells [K]. _ nstarty = Ocurrentcell. _ nstarty)
26 {
27 Plstcells [K]. _ nstartx + = Plstcells [ncompareditemindex]. _ ncolspan;
28 }
29 }
30 }
31 Else
32 {
33 Bactedpush =   False ;
34 }
35 }
36 Else
37 {
38 Bactedpush =   False ;
39 }
40 }
41   While (Bactedpush );

 

Taking four TD in the example table as an example, the coordinates of the four TD after the vertical derivation are as follows:

 

 

Some notes about the sample code:
1. In the sample code, I used a config file to control the file name of the generated Excel file, the content and position of the Report in it. The basic content is as follows:

< Exceldocument >
< Baseinfo >
< Filename > Sample Excel File </ Filename > <! -- Generate the Excel file name -->
< Sheetcount > 1 </ Sheetcount > <! -- Number of sheet in Excel -->
</ Baseinfo >
< Tables >
< Exceltable >
< Tablename > Example Table 1 </ Tablename > <! -- Report Name -->
< Whichsheet > 1 </ Whichsheet > <! -- Sheet Sn -->
< Startx > 2 </ Startx > <! -- X coordinate in the upper left corner -->
< Starty > 2 </ Starty > <! -- Y coordinate in the upper left corner -->
< Source > Sample_page_1.aspx </ Source > <! -- Table Page -->
</ Exceltable >
</ Tables >
</ Exceldocument >

2. During HTML parsing, I used the htmlagilitypack parsing tool. If you are interested, you can study it. The address is here: http://htmlagilitypack.codeplex.com /.
3. In the HTML parsing process of htmlagilitypack, the gap between HTML tags is also considered as a node, which is a text node whose content is a Null String. This should be noted.
4. The sample code is basically a complete function and has little coupling with other modules in the system. If you have similar requirements, you can use them directly.

 

Example code: excelgenerator.rar

 

Related Articles: WYSIWYG Excel report generation (1) -- Obtain the HTML table structure

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.