. NET parse HTML code -- NSoup,. net parse htmlnsoup
NSoup is an open-source framework and is a. NET porting version of JSoup (Java ).
1. Use it directly
NSoup. Nodes. Document htmlDoc = NSoup. NSoupClient. Parse (HTMLString); // No need to instantiate
NSoup is powerful in that it can use js-like methods to obtain node elements.
GetElementByTag ("p ")
NSoup. select. elements ele = htmlDoc. getElementsByTag (TbTag. text); foreach (var item in ele) {if (item. attr ("class") = "col-sm-4 col-xs-6 listtit1") // attributes of an element can be obtained through Attr ("href) {sb. appendLine (item. text (); // you can use the Text () method to obtain the Text content in the element }}
2. A small winform demo is created.
Key code:
The following example shows how to extract the 211 university name and location in an html code:
Private void GetHtml_Click (object sender, EventArgs e) {// obtain the Html code TbCode of the specified address. text = ""; string url = TbUrl. text. trim (); WebClient client = new WebClient (); client. encoding = System. text. encoding. UTF8; string html = client. downloadString (url); TbCode. text = html;} private void Analysis_Click (object sender, EventArgs e) {NSoup. nodes. document htmlDoc = NSoup. NSoupClient. parse (TbCode. text); NSoup. select. elements ele = htmlDoc. getElementsByTag (TbTag. text); System. text. stringBuilder sb = new StringBuilder (); foreach (var item in ele) {sb. appendLine (item. text (); foreach (var item1 in item. nextElementSibling. children) {if (item1.Attr ("class") = "col-sm-4 col-xs-6 listtit1") {sb. appendLine (item1.Text () ;}} TbElement. text = sb. toString ();}
Html code snippets (incomplete due to space reasons ):
<H4> Anhui
Effect (tag type h4 ):
3. Sample Code for inserting a database (extract the professional name and number ):
Private void BtnInsert_Click (object sender, EventArgs e) {NSoup. nodes. document htmlDoc = NSoup. NSoupClient. parse (TbCode. text); NSoup. select. elements eles = htmlDoc. getElementsByTag (TbTag. text); string connectionString = @ "here is the connection string"; using (SqlConnection conn = new SqlConnection (connectionString) {conn. open (); foreach (var item in eles) {if (item. attr ("width") = "20%" & item. nextElementSibling
! = Null & item. Text (). Length> 4) {string SQL = "insert into Major (mId, mName) values
('"+ Item. text () + "','" + item. nextElementSibling. text () + "')"; using (SqlCommand cmd = new SqlCommand (SQL, conn) {cmd. executeNonQuery () ;}} conn. close ();}}
Html code snippet:
<Table class = "linoleic" border = "0" cellspacing = "1" cellpadding = "0" width = "100%" bgcolor = "# dedede"> <tbody> <tr class = "scon">
<Td bgcolor = "# ffffff" width = "20%"> <strong> 0101 </strong> </td>
<Td bgcolor = "# ffffff" height = "30"> <strong> philosophical class </strong> </td>
</Tr> <td bgcolor = "# ffffff" width = "20%"> 010101 </td>
<Td bgcolor = "# ffffff" height = "30"> <span class = "pl20"> philosophy </span> </td> </tr> <td bgcolor = "# ffffff" width = "20%"> 010102 </td>
<Td bgcolor = "# ffffff" height = "30"> <span class = "pl20"> logic </span> </td> </tr> <td bgcolor = "# ffffff" width = "20%"> 010103 K </td>
<Td bgcolor = "# ffffff" height = "30"> <span class = "pl20"> tutorial </span> </td> </tr> </tbody> </table>
There are many methods, so I will not list them one by one. I can try several more methods. Of course, if you are familiar with Python, you don't have to worry so much. Python has a more professional html Parser, so you can have time to understand it.