HtmlAgilityPack: HTML operation class library
1. Read the html webpage content on the network, obtain the html in the element body of the webpage, process the src attribute of all img elements, and return it as a string
If (l_sWenBenHtmlFtpPath.Substring (l_sWenBenHtmlFtpPath.LastIndexOf (". ") + 1) =" html ") {HtmlWeb htmlWeb = new HtmlWeb (); HtmlDocument htmlDoc = htmlWeb. load (l_sWenBenHtmlFtpPath); HtmlNode htmlNode = htmlDoc. documentNode; HtmlNodeCollection nodes = htmlNode. selectNodes ("// body"); // use the xpath syntax to query if (nodes! = Null) {foreach (HtmlNode bodyTag in nodes) {HtmlNodeCollection nodes2 = htmlNode. SelectNodes ("// img"); // query if (nodes2! = Null) {foreach (HtmlNode imgTag in nodes2) {string imgHttpPath = imgTag. attributes ["src"]. value; imgTag. attributes ["src"]. value = l_sWenBenHtmlFtpPath.Substring (0, l_sWenBenHtmlFtpPath.LastIndexOf ("/") + 1) + imgHttpPath;} l_sWenBenHtml = bodyTag. innerHtml ;}}}
2. Use the HtmlAgilityPack Html operation class library to load html strings as html document objects, and then perform operations on html dom.
// 1. decodes the html string sDecodeString = HttpUtility. htmlDecode (HttpUtility. urlDecode (sEncodeString); // 2. concatenate the complete html string sDecodeString = @ "<! DOCTYPE html> + SDecodeString + @ "</div> </body> // 5. Save the string to an html file
// Do something
Continuously improving and updating. Please wait...