This article uses the classes provided by C # And. NET to easily create a program that captures the source code of webpage content. HTTP is one of the most basic protocols for WWW Data Access. NET provides two object classes: HTTPWebRequest and HTTPWebResponse, which are used to send requests and obtain responses to a resource respectively. To get the content of a resource, we first specify a URL address to be crawled, use the HTTPWebRequest object for request, and use the HTTPWebResponse object to receive response results, finally, use the TextStream object to extract the information we want and print it out on the console.
The following describes how to implement this function:
Step 1: Open VS. NET, click "file"-"New"-"project", select "Visual C # project" as the project type, and select "Windows application" as the template ",
Step 2: add the Label1, Button1, TextBox1, and TextBox2 controls to Form1, and change the Multiline attribute of TextBox2 to True,
Step 3: Right-click the Form1 form, select "view code", and enter:
Using System. IO; Using System. Net; Using System. Text;
Private void button#click (object sender, System. EventArgs e) {
} |
Enter the following code between the brackets:
Byte [] buf = new byte [1, 38192]; HttpWebRequest request = (HttpWebRequest) WebRequest. Create (textBox1.Text ); HttpWebResponse response = (HttpWebResponse) Request. GetResponse (); Stream resStream = response. GetResponseStream (); Int count = resStream. Read (buf, 0, buf. Length ); TextBox2.Text = Encoding. Default. GetString (buf, 0, Count ); ResStream. Close (); |
Step 4: click "Save all" and press "F5" to run the application. In the single-line text box after "Enter URL address:", enter http://lucky.myrice.com/down.htmand click "HTML code, the code for this address is displayed!
Next, we will analyze the above program:
The function of the above program is to capture the webpage token. First, we instantiate the HttpWebRequest object and use the static method Create () of the WebRequest class. The string parameter of this method is the URL address of the page to be requested () the method returns the WebRequest type. We must shape it (that is, type conversion) to the HttpWebRequest type, and then assign it to the request variable. Once an HttpWebRequest object is created, you can use its GetResponse () method to return a WebResponse object, and then form it into an HttpWebResponse object and assign it to the response variable. Now, you can use the GetResponseStream () method of the response object to get the response text Stream, and finally use the Read () method of the Stream object () the method puts the returned response information in the buf of the byte array we initially created. Read () has three parameters: the byte array to be put, the starting position of the byte array, length of the byte array. Finally, convert the byte into a string. Note: The Default encoding is used here. It uses the Default encoding method, so we do not need to convert the character encoding. You can also use WebRequest and WebResponse to implement the above functions. The Code is as follows:
WebRequest request = WebRequest. Create (textBox1.Text ); WebResponse response = request. GetResponse (); |
Enter other URLs to see if they are convenient!