It is very convenient to capture pages using HtmlAgilityPack, but garbled characters will appear when the page is gb2312 encoding. I checked it online and said that the default method for obtaining pages is not mature enough. I don't know what it is, I think it is not mature enough.
HtmlWeb htmlWeb = new HtmlWeb();HtmlDocument htmlDocument = htmlWeb.Load(@url);
The solution is as follows:
Create a new method to obtain HtmlDocument. The uploaded URL is the URL of the captured page.
private static HtmlDocument GetHtmlDocument(string url) { HttpWebRequest httpWebRequest = WebRequest.Create(new Uri(@url)) as HttpWebRequest; httpWebRequest.Method = "GET"; WebResponse webResponse = httpWebRequest.GetResponse(); Stream stream = webResponse.GetResponseStream(); HtmlDocument htmlDocument = new HtmlDocument(); htmlDocument.Load(stream); return htmlDocument; }
According to the comment of @, this attribute solves (O (∩ _ ∩) O ~) :
HtmlWeb htmlWeb = new HtmlWeb(); htmlWeb.OverrideEncoding = Encoding.GetEncoding("gb2312");
That's all! As for the use of the following methods are the same, specific can refer to this blog, talking about the very detailed ha http://www.cnblogs.com/linfei721/archive/2013/05/08/3066697.html