Use htmlparser to parse HTML (C)

Source: Internet
Author: User
Tags lexer

To parse HTML, extract HTML data, or modify HTML data, htmlparser is a good choice.

Htmlparser can be used to parse HTML data on the local and network:

Parser = new Parser (New winista. Text. htmlparser. http. httpprotocol (New uri ("uristring ")));

Parser = new Parser (New winista. Text. htmlparser. Lex. lexer ("htmlstring "));

System. Io. Stream stream = new system. Io. filestream ("filepath", system. Io. filemode. Open );

Parser = new Parser (New winista. text. htmlparser. lex. lexer (New winista. text. htmlparser. lex. page (stream, "charset"); you can also analyze the data of certain nodes and use nodeclassfilter to specify the node type to be analyzed:

Nodefilter filter = new nodeclassfilter (typeof (winista. Text. htmlparser. tags. Div); Use the parse method of the parser instance to obtain the node Array

Nodelist = parser. parse (null );

Nodelist = parser. parse (filter); the following section of HTML is analyzed:

<Div class = "divcss" id = "div_1">

<Div name = "Div" class = "divcss" id = "div_2"> div_2 </div>

<Table name = "table" id = "table_1">

<Tr>

<TD> htmlparser </TD>

<TD> <Div id = "div_3"> <font color = "red"> htmlparser </font> </div> </TD>

</Tr>

</Table>

</Div>

Txtresult is used to display the data processed by analysis, and txtsource is the text box for reading HTML data.

// Record the start position of each node to avoid repeated Processing

Ilist <int> Start = new list <int> ();

Protected void button#click (Object sender, eventargs E)

{

This.txt result. Text = string. empty;

Lexer = new lexer (this.txt source. Text );

Parser = new Parser (lexer );

Nodefilter filter = new nodeclassfilter (typeof (winista. Text. htmlparser. tags. Div ));

Nodelist = parser. parse (null );

If (nodelist. Count = 0)

Txtresult. Text = "No compliant node ";

Else

{

For (INT I = 0; I <nodelist. Count; I ++)

{

Paserdata (nodelist [I]);

}

}

}

Private ITAG gettag (inode node)

{

If (node = NULL)

Return NULL;

Return node is ITAG? Node as ITAG: NULL;

}

Private void paserdata (inode node)

{

ITAG tag = gettag (node );

If (tag! = NULL &&! Tag. isendtag ()&&! Start. Contains (tag. startposition ))

{

Object OID = tag. getattribute ("ID ");

Object oname = tag. getattribute ("name ");

Object oclass = tag. getattribute ("class ");

This.txt result. text + = tag. tagname + ": \ r \ NID:" + OID + "name:" + oname + "class:" + oclass + "startposition:" + tag. startposition. tostring () + "\ r \ n ";

Start. Add (tag. startposition );

}

// Subnode

If (node. Children! = NULL & node. Children. Count> 0)

{

Paserdata (node. firstchild );

}

// Sibling Node

Inode siblingnode = node. nextsibling;

While (siblingnode! = NULL)

{

Paserdata (siblingnode );

Siblingnode = siblingnode. nextsibling;

}

} The data displayed by txtresult is:

Div:

ID: div_1 name: Class: divcss startposition: 0

Div:

ID: div_2 name: div class: divcss startposition: 34

Table:

ID: table_1 name: Table class: startposition: 90

TR:

ID: Name: Class: startposition: 127

TD:

ID: Name: Class: startposition: 136

TD:

ID: Name: Class: startposition: 160

Div:

ID: div_3 name: Class: startposition: 164

Font:

ID: Name: Class: startposition: 180

Htmlparser analyzes the specified data and makes some modifications to the data to be analyzed: To specify attributes without the name and class attributes:

Object OID = tag. getattribute ("ID ");

Object oname = tag. getattribute ("name ");

Object oclass = tag. getattribute ("class ");

If (oname = NULL)

{

Oname = "name ";

Tag. setattribute ("name", oname. tostring ());

}

If (oclass = NULL)

{

Oclass = "class ";

Tag. setattribute ("name", oclass. tostring ());

}

This.txt result. Text + = tag. tagname + ": \ r \ NID:" + OID + "name:" + oname

+ "Class:" + oclass + "startposition:" + tag. startposition. tostring () + "\ r \ n ";

Start. Add (tag. startposition); the data displayed by txtresult is:

Div:

ID: div_1 name: Name class: divcss startposition: 0

Div:

ID: div_2 name: div class: divcss startposition: 34

Table:

ID: table_1 name: Table class: Class startposition: 90

TR:

ID: Name class: Class startposition: 127

TD:

ID: Name class: Class startposition: 136

TD:

ID: Name class: Class startposition: 160

Div:

ID: div_3 name: Name class: Class startpoint: 164

Font:

ID: Name class: Class startposition: 180

Htmlparser achieves our goal. Now we are adding a subnode to the node whose node is Div and whose ID is div_3:

Object OID = tag. getattribute ("ID ");

Object oname = tag. getattribute ("name ");

Object oclass = tag. getattribute ("class ");

If (tag. tagname = "Div" & tag. getattribute ("ID") = "div_3 ")

{

Inode newnode = new textnode ("Add a new node ");

Tag. Children. Add (newnode );

}

This.txt result. Text + = tag. tagname + ": \ r \ NID:" + OID + "name:" + oname

+ "Class:" + oclass + "startposition:" + tag. startposition. tostring () + "\ r \ n"; Output nodelist [0]. tohtml ():

<Div class = "divcss" id = "div_1">

<Div name = "Div" class = "divcss" id = "div_2"> div_2 </div>

<Table name = "table" id = "table_1">

<Tr>

<TD> htmlparser </TD>

<TD> <Div id = "div_3"> <font color = "red"> htmlparser </font> Add a new node </div> </TD>

</Tr>

</Table>

</Div> the DIV node with ID div_3 is followed by the data to be added.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.