Use htmlparser to parse HTML (C)

Last Update:2018-12-07 Source: Internet

Author: User

Tags lexer

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

To parse HTML, extract HTML data, or modify HTML data, htmlparser is a good choice.

Htmlparser can be used to parse HTML data on the local and network:

Parser = new Parser (New winista. Text. htmlparser. http. httpprotocol (New uri ("uristring ")));

Parser = new Parser (New winista. Text. htmlparser. Lex. lexer ("htmlstring "));

System. Io. Stream stream = new system. Io. filestream ("filepath", system. Io. filemode. Open );

Parser = new Parser (New winista. text. htmlparser. lex. lexer (New winista. text. htmlparser. lex. page (stream, "charset"); you can also analyze the data of certain nodes and use nodeclassfilter to specify the node type to be analyzed:

Nodefilter filter = new nodeclassfilter (typeof (winista. Text. htmlparser. tags. Div); Use the parse method of the parser instance to obtain the node Array

Nodelist = parser. parse (null );

Nodelist = parser. parse (filter); the following section of HTML is analyzed:

<Tr>

<TD> htmlparser </TD>

<TD> <Div id = "div_3"> <font color = "red"> htmlparser </font> </div> </TD>

</Tr>

</Table>

</Div>

Txtresult is used to display the data processed by analysis, and txtsource is the text box for reading HTML data.

// Record the start position of each node to avoid repeated Processing

Ilist <int> Start = new list <int> ();

Protected void button#click (Object sender, eventargs E)

{

This.txt result. Text = string. empty;

Lexer = new lexer (this.txt source. Text );

Parser = new Parser (lexer );

Nodefilter filter = new nodeclassfilter (typeof (winista. Text. htmlparser. tags. Div ));

Nodelist = parser. parse (null );

If (nodelist. Count = 0)

Txtresult. Text = "No compliant node ";

Else

{

For (INT I = 0; I <nodelist. Count; I ++)

{

Paserdata (nodelist [I]);

}

Private ITAG gettag (inode node)

{

If (node = NULL)

Return NULL;

Return node is ITAG? Node as ITAG: NULL;

}

Private void paserdata (inode node)

{

ITAG tag = gettag (node );

If (tag! = NULL &&! Tag. isendtag ()&&! Start. Contains (tag. startposition ))

{

Object OID = tag. getattribute ("ID ");

Object oname = tag. getattribute ("name ");

Object oclass = tag. getattribute ("class ");

This.txt result. text + = tag. tagname + ": \ r \ NID:" + OID + "name:" + oname + "class:" + oclass + "startposition:" + tag. startposition. tostring () + "\ r \ n ";

Start. Add (tag. startposition );

}

// Subnode

If (node. Children! = NULL & node. Children. Count> 0)

{

Paserdata (node. firstchild );

}

// Sibling Node

Inode siblingnode = node. nextsibling;

While (siblingnode! = NULL)

{

Paserdata (siblingnode );

Siblingnode = siblingnode. nextsibling;

}

} The data displayed by txtresult is:

Div:

ID: div_1 name: Class: divcss startposition: 0

Div:

ID: div_2 name: div class: divcss startposition: 34

Table:

ID: table_1 name: Table class: startposition: 90

TR:

ID: Name: Class: startposition: 127

TD:

ID: Name: Class: startposition: 136

TD:

ID: Name: Class: startposition: 160

Div:

ID: div_3 name: Class: startposition: 164

Font:

ID: Name: Class: startposition: 180

Htmlparser analyzes the specified data and makes some modifications to the data to be analyzed: To specify attributes without the name and class attributes:

Object OID = tag. getattribute ("ID ");

Object oname = tag. getattribute ("name ");

Object oclass = tag. getattribute ("class ");

If (oname = NULL)

{

Oname = "name ";

Tag. setattribute ("name", oname. tostring ());

}

If (oclass = NULL)

{

Oclass = "class ";

Tag. setattribute ("name", oclass. tostring ());

}

This.txt result. Text + = tag. tagname + ": \ r \ NID:" + OID + "name:" + oname

+ "Class:" + oclass + "startposition:" + tag. startposition. tostring () + "\ r \ n ";

Start. Add (tag. startposition); the data displayed by txtresult is:

Div:

ID: div_1 name: Name class: divcss startposition: 0

Div:

ID: div_2 name: div class: divcss startposition: 34

Table:

ID: table_1 name: Table class: Class startposition: 90

TR:

ID: Name class: Class startposition: 127

TD:

ID: Name class: Class startposition: 136

TD:

ID: Name class: Class startposition: 160

Div:

ID: div_3 name: Name class: Class startpoint: 164

Font:

ID: Name class: Class startposition: 180

Htmlparser achieves our goal. Now we are adding a subnode to the node whose node is Div and whose ID is div_3:

Object OID = tag. getattribute ("ID ");

Object oname = tag. getattribute ("name ");

Object oclass = tag. getattribute ("class ");

If (tag. tagname = "Div" & tag. getattribute ("ID") = "div_3 ")

{

Inode newnode = new textnode ("Add a new node ");

Tag. Children. Add (newnode );

}

This.txt result. Text + = tag. tagname + ": \ r \ NID:" + OID + "name:" + oname

+ "Class:" + oclass + "startposition:" + tag. startposition. tostring () + "\ r \ n"; Output nodelist [0]. tohtml ():

<Tr>

<TD> htmlparser </TD>

<TD> <Div id = "div_3"> <font color = "red"> htmlparser </font> Add a new node </div> </TD>

</Tr>

</Table>

</Div> the DIV node with ID div_3 is followed by the data to be added.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More