This article describes the implementation of ASP.net crawl Web page content. Share to everyone for your reference. The implementation methods are as follows:
First, asp.net use HttpWebRequest crawl Web content
Copy Code code as follows:
<summary> Method One: Comparison recommendation
Using HttpWebRequest to get the source page
For pages with a BOM is very effective, no matter what the code can be correctly identified
</summary>
<param name= "url" > Web address "</param>
<returns> return page source file </returns>
public static string GetHtmlSource2 (string url)
{
Handling content
String html = "";
HttpWebRequest request = (HttpWebRequest) webrequest.create (URL);
Request. Accept = "*/*"; Accept any File
Request. useragent = "mozilla/4.0" (compatible; MSIE 6.0; Windows NT 5.2. NET CLR 1.1.4322) "; //
Request. AllowAutoRedirect = true;//Whether 302 is allowed
Request. Cookiecontainer = new Cookiecontainer ();//cookie container,
Request. Referer = URL; References to the current page
HttpWebResponse response = (HttpWebResponse) request. GetResponse ();
Stream stream = Response. GetResponseStream ();
StreamReader reader = new StreamReader (stream, Encoding.default);
HTML = reader. ReadToEnd ();
Stream. Close ();
return HTML;
}
Second, asp.net use WebResponse crawl Web content
Copy Code code as follows:
public static string GetHttpData2 (String Url)
{
string sexception = null;
string srslt = null;
WebResponse owebrps = null;
WebRequest Owebrqst = WebRequest.Create (URL);
Owebrqst.timeout = 50000;
Try
{
Owebrps = Owebrqst.getresponse ();
}
catch (WebException E)
{
Sexception = E.message.tostring ();
}
catch (Exception e)
{
Sexception = E.tostring ();
}
Finally
{
if (Owebrps!= null)
{
StreamReader OSTREAMRD = new StreamReader (Owebrps.getresponsestream (), encoding.getencoding ("Utf-8"));
Srslt = Ostreamrd.readtoend ();
Ostreamrd.close ();
Owebrps.close ();
}
}
return srslt;
}
I hope this article will help you with the C # program.