NET Parsing HTML code--nsoup
Nsoup is an open source framework, the. Net migration version of Jsoup (Java)
1, directly use up
NSoup.Nodes.Document Htmldoc = NSoup.NSoupClient.Parse (htmlstring); No instantiation required
The great thing about Nsoup is that you can use JS-like methods to get node elements
Get element Getelementbytag by Element type ("P")
NSoup.Select.Elements ele= Htmldoc.getelementsbytag (Tbtag.text); foreach (var item in ele) {if (item. Attr ("class") = = "Col-sm-4 col-xs-6 listtit1") //by Attr ("" "href") you can get the attribute of the element {sb. Appendline (item. Text ()); Text content in an element can be obtained through the text () method}}
2, did a WinForm of a small demo
Key code:
The following is an example of extracting 211 university names and places in an HTML code:
private void Gethtml_click (object sender, EventArgs e) {//Gets the HTML code for the specified address Tbcode.text = ""; String url = TbUrl.Text.Trim (); WebClient client = new WebClient (); Client. Encoding = System.Text.Encoding.UTF8; String html = client. downloadstring (URL); Tbcode.text = html; The private void Analysis_click (object sender, EventArgs e) {NSoup.Nodes.Document Htmldoc = Nsoup. Nsoupclient.parse (Tbcode.text); NSoup.Select.Elements ele= Htmldoc.getelementsbytag (tbtag.text); System.Text.StringBuilder sb=new StringBuilder (); foreach (var item in ele) {sb. Appendline (item. Text ()); foreach (Var item1 in item. Nextelementsibling.children) {if (item1. Attr ("class") = = "Col-sm-4 col-xs-6 listtit1") {sb. Appendline (item1. Text ()); }}} Tbelement.text = sb. ToString (); }
HTML snippet (not posted for space reasons):
Effect (label type H4):
3. Insert Database code example (extract professional name and number):
private void Btninsert_click (object sender, EventArgs e) {NSoup.Nodes.Document Htmldoc = nsoup.nsoupcli Ent. Parse (Tbcode.text); NSoup.Select.Elements eles = Htmldoc.getelementsbytag (Tbtag.text); String connectionString = @ "Here is the connection string"; using (SqlConnection conn = new SqlConnection (connectionString)) {Conn. Open (); foreach (var item in eles) {if (item. Attr ("width") = = "20%" && item. Nextelementsibling
! = NULL && item. Text (). Length > 4) {String sql = "INSERT into Major (mid,mname) values
(' "+item. Text () + "', '" + Item. Nextelementsibling.text () + "')"; using (SqlCommand cmd = new SqlCommand (SQL, conn)) {cmd. ExecuteNonQuery (); }}} conn. Close (); } }
HTML code snippet:
<table class= "CLA" border= "0" cellspacing= "1" cellpadding= "0" width= "100%" bgcolor= "#dedede" > <tbody& Gt <tr class= "Scon",
<td bgcolor= "#ffffff" width= "20%" ><strong>0101</strong></td> <td bgcolor= "#ffffff" height= "><strong> philosophy </strong></td>
</tr> <TR><TD bgcolor= "#ffffff" width= "20%" >010101</TD>
<td bgcolor= "#ffffff" height= ">&" Lt;span class= "Pl20" > Philosophy </span></td></tr> <tr><td bgcolor= "#ffffff" width= "20%" >0 10102</td>
<td bgcolor= "#ffffff" height= "><span class=" PL20 "> Logic </span></td>& lt;/tr> <tr><td bgcolor= "#ffffff" width= "20%" >010103K</TD>
<td bgcolor= "#fffff F "height=" ><span class= "Pl20" > Religion </span></td></tr> </tbody></table>
There are many kinds of methods, do not list, you can try several, of course, familiar with Python friends do not have to be so troublesome, Python has a more professional HTML parser, have time to understand
NET Parsing HTML code--nsoup