C #. net capture webpage content

Source: Internet
Author: User

It is very convenient to capture webpage content in ASP. NET, But it solves the encoding problem that troubles us in ASP.

1. Capture general content

Three classes are required: webrequest, webresponse, and streamreader.

Required namespaces: system. NET and system. Io

Core code:

The create method of the webrequest class is a static method, and the parameter is the URL of the webpage to be crawled;

Encoding specifies the encoding. encoding has global common encoding such as ASCII, UTF32, and utf8, but does not have the gb2312 encoding attribute. Therefore, we use getencoding to obtain gb2312 encoding.

Private string getgeneralcontent (string strurl) <br/>{< br/> string strmsg = string. empty; <br/> try <br/> {<br/> webrequest request = webrequest. create (strurl); <br/> webresponse response = request. getresponse (); <br/> streamreader reader = new streamreader (response. getresponsestream (), encoding. getencoding ("gb2312"); </P> <p> strmsg = reader. readtoend (); </P> <p> reader. close (); <br/> reader. dispose (); <br/> response. close (); <br/>}< br/> catch <br/>{}< br/> return strmsg; <br/>}

2. capture images or other binary files (such as files)

Four classes are required: webrequest, webresponse, stream, and filestream.

Required namespaces: system. NET and system. Io

Core code: Read with stream

Private string getfilecontent (string strurl) <br/>{< br/> string strmsg = string. empty; <br/> try <br/> {<br/> webrequest request = webrequest. create (strurl); <br/> webresponse response = request. getresponse (); <br/> stream reader = response. getresponsestream (); </P> <p> // you can save it as a specific file. <br/> filestream writer = new filestream ("D: \ logo.gif", filemode. openorcreate, fileaccess. write); <br/> byte [] buff = new byte [512]; <br/> int C = 0; // actual number of bytes read <br/> while (C = reader. read (buff, 0, Buff. length)> 0) <br/>{< br/> writer. write (buff, 0, c); <br/>}< br/> writer. close (); <br/> writer. dispose (); </P> <p> reader. close (); <br/> reader. dispose (); <br/> response. close (); </P> <p> strmsg = "saved successfully "; <br/>}< br/> catch <br/>{}< br/> return strmsg; <br/>}

3. webpage content capturing and post

When capturing a webpage, you sometimes need to send some data to the server through post, and add the following code to the webpage grabbing program to post the user name and password to the server:

Private string getpostcontent (string strurl) <br/>{< br/> string strmsg = string. empty; <br/> try <br/> {<br/> string data = "username = Admin & passwd = admin888"; <br/> byte [] requestbuffer = system. text. encoding. getencoding ("gb2312 "). getbytes (data); </P> <p> webrequest request = webrequest. create (strurl); <br/> request. method = "Post"; <br/> request. contenttype = "application/X-WWW-form-urlencoded"; <br/> request. contentlength = requestbuffer. length; <br/> using (Stream requeststream = request. getrequeststream () <br/>{< br/> requeststream. write (requestbuffer, 0, requestbuffer. length); <br/> requeststream. close (); <br/>}</P> <p> webresponse response = request. getresponse (); <br/> using (streamreader reader = new streamreader (response. getresponsestream (), encoding. getencoding ("gb2312") <br/>{< br/> strmsg = reader. readtoend (); <br/> reader. close (); <br/>}< br/> catch <br/>{}< br/> return strmsg; <br/>}

4. ASP. NET captures webpage content-prevent redirection

After successfully logging on to the Application System of the server, the application system may use response. redirect redirects the webpage. If you do not need to respond to this redirection, do not set reader. readtoend () to response. write it out.

5. ASP. NET captures webpage content-maintains logon status

After successfully logging on to the server application system using post data, we can capture the pages to be logged on. Therefore, we may need to maintain the logon status among multiple requests.

First, we need to use httpwebrequest instead of webrequest.

Compared with webrequest, the changed code is:

Httpwebrequest request = (httpwebrequest) httpwebrequest. Create (strurl );

Note: The type returned by httpwebrequest. Create is still webrequest, so you need to convert it.

Next, use cookiecontainer.

System. net. cookiecontainer cc = new system. net. cookiecontainer (); <br/> request. cookiecontainer = cc; <br/> request2.cookiecontainer = cc;

In this way, the same session is used between request and request2. If the request is logged on, request2 is also logged on.

Finally, how can we use the same cookiecontainer between different pages.

To use the same cookiecontainer between different pages, you only need to add cookiecontainer to the session.

Session. Add ("CCC", CC); // save <br/> cookiecontainer cc = (cookiecontainer) session ["CCC"]; // obtain

For example:

Httpwebrequest request = (httpwebrequest) httpwebrequest. create (strurl); </P> <p> // the same page <br/> // httpwebrequest request2 = (httpwebrequest) httpwebrequest. create (strurl); <br/> // system. net. cookiecontainer cc = new cookiecontainer (); <br/> // request. cookiecontainer = cc; <br/> // request2.cookiecontainer = cc; <br/> // for different pages, request, separate request2 <br/> Object OBJ = session ["CCC"]; <br/> If (OBJ = NULL) <br/>{< br/> cookiecontainer cc = new cookiecontainer (); <br/> // save requestr <br/> session. add ("CCC", CC); <br/>}< br/> string strurl2 = ""; <br/> httpwebrequest request2 = (httpwebrequest) httpwebrequest. create (strurl2); <br/> // obtain <br/> cookiecontainer CC2 = (cookiecontainer) session ["CCC"]; <br/> request2.cookiecontainer = CC2; <br/> // proceed to the next step

6. ASP. NET captures webpage content-bringing the current session to webrequest

For example, if browser B1 accesses server S1, a session is generated, and server S2 uses webrequest to access server S1, a session is generated. The current requirement is to allow webrequest to use the session between the browser B1 and S1, that is, to make S1 think that B1 is accessing S1, rather than S2 is accessing S1.

This requires the use of cookies. First, the cookie with the sessionid of B1 is obtained in S1, then the cookie is told to S2, and S2 then the cookie is written in webrequest.

 Webrequest request = webrequest. create ("url"); <br/> request. headers. add (httprequestheader. cookie, "aspsessionidscatbtad = knndkcnbonboobihhhhaokdm;"); <br/> webresponse response = request. getresponse (); <br/> streamreader reader = new streamreader (response. getresponsestream (), system. text. encoding. getencoding ("gb2312"); <br/> response. write (reader. readtoend (); <br/> reader. close (); <br/> reader. dispose (); <br/> response. close ();

Note:

  • This article does not mean Cookie spoofing, because sessionid is what S1 tells S2, not S2 theft. Although it is odd, it may be useful in some specific application systems.
  • S1 must write the session to B1 so that the sessionid will be saved to the cookie, and the sessionid will remain unchanged.
  • Request. Cookies are used to retrieve cookies in ASP. NET. This document assumes that cookies have been obtained.
  • Sessionid is not the same in cookie for different server languages. This document uses ASP sessionid.
  • S1 may not only rely on sessionid to determine the current logon, but may also assist in Referer and User-Agent, depending on the design of the S1 program.
  • In fact, this article is another way to "maintain the login status" in the current serialization.

7. ASP. NET capture webpage content-how to change the source Referer and useragent

Httpwebrequest request = (httpwebrequest) httpwebrequest. create ("http: // 127.0.0.1/index.htm"); <br/> request. referer = "http://www.csdn.net/"; <br/> request. useragent = "header to be set"; <br/> // the next step

Source: http://www.cftea.com/c/2008/07/B0AUNSSUA1NQCVIF.asp

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.