Three types of asp.net web page capture source code, asp.net capture
/// <Summary> Method 1: It is recommended to use HttpWebRequest to obtain the webpage source code. // It is very effective for webpages with BOM, no matter what the encoding is, you can correctly identify /// </summary> /// <param name = "url"> webpage address "</param> /// <returns> to return the webpage source file </returns> public static string GetHtmlSource2 (string url) {// processing content string html = ""; HttpWebRequest request = (HttpWebRequest) WebRequest. create (url); request. accept = "*/*"; // Accept any file request. userAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2 ;. net clr 1.1.4322) "; // simulate using IE to browse http://www.52mvc.com Request. allowAutoRedirect = true; // whether 302 is allowed // request. cookieContainer = new CookieContainer (); // cookie container, request. referer = url; // reference of the current page HttpWebResponse response = (HttpWebResponse) request. getResponse (); Stream stream = response. getResponseStream (); StreamReader reader = new StreamReader (stream, Encoding. default); html = reader. readToEnd (); stream. close (); return html;} // Method 2: public static strin G GetHttpData2 (string Url) {string sException = null; string sRslt = null; WebResponse oWebRps = null; WebRequest oWebRqst = WebRequest. create (Url); oWebRqst. timeout = 50000; try {oWebRps = oWebRqst. getResponse ();} catch (WebException e) {sException = e. message. toString ();} catch (Exception e) {sException = e. toString ();} finally {if (oWebRps! = Null) {StreamReader oStreamRd = new StreamReader (oWebRps. getResponseStream (), Encoding. getEncoding ("UTF-8"); sRslt = oStreamRd. readToEnd (); oStreamRd. close (); oWebRps. close () ;}} return sRslt ;}/// <summary> method 3: ////// </summary> /// <param name = "url">/address of the website to be accessed </param> /// <param name = "charSets "> encoding of the target webpage, if the input is null or "", the code of the webpage is automatically analyzed </param> // <returns> </returns> public static string getHtml (s) Tring url, params string [] charSets) {try {string charSet = null; if (charSets. length = 1) {charSet = charSets [0];} WebClient myWebClient = new WebClient (); // note the following when creating a WebClient instance myWebClient: // some web pages may not be available, for various reasons such as cookie and encoding problems. // This is the case for specific analysis, such as adding cookies to the header. // webclient. headers. add ("Cookie", cookie); // This may require some overload methods. you can write it as needed // obtain or set the network creden. used to authenticate requests to Internet resources. myWebClient. credentials = Credentials DentialCache. defaultCredentials; // if the server wants to verify the user name, password // NetworkCredential mycred = new NetworkCredential (struser, strpassword); // myWebClient. credentials = mycred; // download data from the resource and return a byte array. (add @ because the URL contains the "/" symbol.) byte [] myDataBuffer = myWebClient. downloadData (url); string strWebData = Encoding. default. getString (myDataBuffer); // obtain the character encoding description of the webpage. Match charSetMatch = Regex. match (strWebData, "<meta ([^ <] *) charset = ([^ <] *)", RegexOptions. ignoreCase | RegexOptions. multiline); string webCharSet = charSetMatch. groups [2]. value; if (charSet = null | charSet = "") charSet = webCharSet; if (charSet! = Null & charSet! = "" & Encoding. GetEncoding (charSet )! = Encoding. default) {strWebData = Encoding. getEncoding (charSet ). getString (myDataBuffer);} else {strWebData = Encoding. getEncoding ("UTF-8 "). getString (myDataBuffer);} return strWebData;} catch (Exception e) {return "";}}
ASPNET obtains the HTTP code of the website.
Point button1, label1.text = getPage (textbox1.text)
Static public string getPage (string url)
{
System. Net. HttpWebRequest req;
System. Net. HttpWebResponse res;
Req = (System. Net. HttpWebRequest) System. Net. WebRequest. Create (url );
Res = (System. Net. HttpWebResponse) req. GetResponse ();
System. IO. StreamReader strm = new System. IO. StreamReader (res. GetResponseStream (), UnicodeEncoding. GetEncoding ("GB2312 "));
Return strm. ReadToEnd ();
}
How can I use aspnet to obtain the automatically submitted webpage source code?
C # extract the source code of the webpage. Take a look at hi.baidu.com/...d.html. There is a method in string GetSource (string webAddress). You only need to input the URL to directly return the source code.
As for the content in the Body, read this article. Hi.baidu.com/...8.html.
There is a method in string [] getStrBetween (string src, string start, string end ). The src parameter is the source code string you obtain using the first method. You can use <body>
End: </body>, which returns an array of strings. Since there is only one <body> </body> pair in the source code of a Web page, only one element exists in the array.
You only need to use these two methods directly.