To parse HTML, extract HTML data, or modify HTML data, htmlparser is a good choice.
Htmlparser can be used to parse HTML data on the local and network:
Parser = new Parser (New winista. Text. htmlparser. http. httpprotocol (New uri ("uristring ")));
Parser = new Parser (New winista. Text. htmlparser. Lex. lexer ("htmlstring "));
System. Io. Stream stream = new system. Io. filestream ("filepath", system. Io. filemode. Open );
Parser = new Parser (New winista. text. htmlparser. lex. lexer (New winista. text. htmlparser. lex. page (stream, "charset"); you can also analyze the data of certain nodes and use nodeclassfilter to specify the node type to be analyzed:
Nodefilter filter = new nodeclassfilter (typeof (winista. Text. htmlparser. tags. Div); Use the parse method of the parser instance to obtain the node Array
Nodelist = parser. parse (null );
Nodelist = parser. parse (filter); the following section of HTML is analyzed:
<Div class = "divcss" id = "div_1">
<Div name = "Div" class = "divcss" id = "div_2"> div_2 </div>
<Table name = "table" id = "table_1">
<Tr>
<TD> htmlparser </TD>
<TD> <Div id = "div_3"> <font color = "red"> htmlparser </font> </div> </TD>
</Tr>
</Table>
</Div>
Txtresult is used to display the data processed by analysis, and txtsource is the text box for reading HTML data.
// Record the start position of each node to avoid repeated Processing
Ilist <int> Start = new list <int> ();
Protected void button#click (Object sender, eventargs E)
{
This.txt result. Text = string. empty;
Lexer = new lexer (this.txt source. Text );
Parser = new Parser (lexer );
Nodefilter filter = new nodeclassfilter (typeof (winista. Text. htmlparser. tags. Div ));
Nodelist = parser. parse (null );
If (nodelist. Count = 0)
Txtresult. Text = "No compliant node ";
Else
{
For (INT I = 0; I <nodelist. Count; I ++)
{
Paserdata (nodelist [I]);
}
}
}
Private ITAG gettag (inode node)
{
If (node = NULL)
Return NULL;
Return node is ITAG? Node as ITAG: NULL;
}
Private void paserdata (inode node)
{
ITAG tag = gettag (node );
If (tag! = NULL &&! Tag. isendtag ()&&! Start. Contains (tag. startposition ))
{
Object OID = tag. getattribute ("ID ");
Object oname = tag. getattribute ("name ");
Object oclass = tag. getattribute ("class ");
This.txt result. text + = tag. tagname + ": \ r \ NID:" + OID + "name:" + oname + "class:" + oclass + "startposition:" + tag. startposition. tostring () + "\ r \ n ";
Start. Add (tag. startposition );
}
// Subnode
If (node. Children! = NULL & node. Children. Count> 0)
{
Paserdata (node. firstchild );
}
// Sibling Node
Inode siblingnode = node. nextsibling;
While (siblingnode! = NULL)
{
Paserdata (siblingnode );
Siblingnode = siblingnode. nextsibling;
}
} The data displayed by txtresult is:
Div:
ID: div_1 name: Class: divcss startposition: 0
Div:
ID: div_2 name: div class: divcss startposition: 34
Table:
ID: table_1 name: Table class: startposition: 90
TR:
ID: Name: Class: startposition: 127
TD:
ID: Name: Class: startposition: 136
TD:
ID: Name: Class: startposition: 160
Div:
ID: div_3 name: Class: startposition: 164
Font:
ID: Name: Class: startposition: 180
Htmlparser analyzes the specified data and makes some modifications to the data to be analyzed: To specify attributes without the name and class attributes:
Object OID = tag. getattribute ("ID ");
Object oname = tag. getattribute ("name ");
Object oclass = tag. getattribute ("class ");
If (oname = NULL)
{
Oname = "name ";
Tag. setattribute ("name", oname. tostring ());
}
If (oclass = NULL)
{
Oclass = "class ";
Tag. setattribute ("name", oclass. tostring ());
}
This.txt result. Text + = tag. tagname + ": \ r \ NID:" + OID + "name:" + oname
+ "Class:" + oclass + "startposition:" + tag. startposition. tostring () + "\ r \ n ";
Start. Add (tag. startposition); the data displayed by txtresult is:
Div:
ID: div_1 name: Name class: divcss startposition: 0
Div:
ID: div_2 name: div class: divcss startposition: 34
Table:
ID: table_1 name: Table class: Class startposition: 90
TR:
ID: Name class: Class startposition: 127
TD:
ID: Name class: Class startposition: 136
TD:
ID: Name class: Class startposition: 160
Div:
ID: div_3 name: Name class: Class startpoint: 164
Font:
ID: Name class: Class startposition: 180
Htmlparser achieves our goal. Now we are adding a subnode to the node whose node is Div and whose ID is div_3:
Object OID = tag. getattribute ("ID ");
Object oname = tag. getattribute ("name ");
Object oclass = tag. getattribute ("class ");
If (tag. tagname = "Div" & tag. getattribute ("ID") = "div_3 ")
{
Inode newnode = new textnode ("Add a new node ");
Tag. Children. Add (newnode );
}
This.txt result. Text + = tag. tagname + ": \ r \ NID:" + OID + "name:" + oname
+ "Class:" + oclass + "startposition:" + tag. startposition. tostring () + "\ r \ n"; Output nodelist [0]. tohtml ():
<Div class = "divcss" id = "div_1">
<Div name = "Div" class = "divcss" id = "div_2"> div_2 </div>
<Table name = "table" id = "table_1">
<Tr>
<TD> htmlparser </TD>
<TD> <Div id = "div_3"> <font color = "red"> htmlparser </font> Add a new node </div> </TD>
</Tr>
</Table>
</Div> the DIV node with ID div_3 is followed by the data to be added.