Introduction
I think anyone who is familiar with the HTTP protocol can give a reason. But if I ask you what HTTP request methods are available? What is the difference between post and get? Is there a limit on the size of the data transmitted by get or post? What are the HTTP Response statuses? And how do you use it in C? If you cannot answer most of the questions clearly, this chapter is for you! The outline is as follows:
· 1. Http Overview
O 1.1. Interaction between HTTP client and server
O 1.2, HTTP message
O 1.3. HTTP Request Method
O 1.4, HTTP ResponseCode
· 2. packet capture Analysis
· 3. Differences between post and get
· 4. Use an example to describe how to use post, get, and other operations in C #
O 4.1, httpwebrequest
O 4.2, httpwebresponse
O 4.3. Compile winformProgramOpen the blog homepage (with source code)
1. Http Overview
To help you remember or understand the HTTP protocol, first let's look at the HTTP protocol. Hypertext Transfer Protocol (HTTP) is the most widely used network protocol on the Internet. All WWW files must comply with this standard. HTTP was designed to provide a method for publishing and receiving HTML pages.
The development of HTTP is the result of cooperation between the World Wide Web Consortium and the Internet team (Internet Engineering Task Force). They finally published a series of RFC, the most famous one is RFC 2616. RFC 2616 defines a widely used version of HTTP 1.1.
1.1 interaction between HTTP client and server
HTTP is a standard (TCP) for client and server requests and responses ). The client is an end user and the server is a website. By using a Web browser, web crawler, or other tools, the client initiates an HTTP request to the specified port on the server (the default port is 80. (We call this client) Call the user agent ). The response server stores (some) resources, such as HTML files and images. This response server is the origin server ). There may be multiple middle layers between the user proxy and the source server, such as the proxy, gateway, or tunnel ). Although TCP/IP is the most popular application on the internet, HTTP does not stipulate that it must be used and (based on) the layer it supports. In fact, HTTP can be implemented on any other Internet protocol or on another network. HTTP only assumes that (provided by its lower-layer protocol) reliable transmission, any protocol that can provide such assurance can be used by it.
Generally, an HTTP client initiates a request to establish a TCP connection to the specified port on the server (port 80 by default. The HTTP server listens to the requests sent from the client on that port. Once a request is received, the server (to the client) sends a status line, such as "HTTP/1.1 200 OK", and (response) message, the message body may be the requested file, error message, or other information.
The reason why HTTP uses TCP instead of UDP is that a webpage must transmit a lot of data, while TCP provides transmission control, organizes data in order, and corrects errors. Resources requested through HTTP or HTTPS are identified by uniform resource identifiers (or, more accurately, URIs.
The structure and interaction process between the client and the server can be shown in the following two figures:
Figure 1. Web client-server structure (the hypertext link of the Web server jumps to another server through a link on the website)
Figure 2 interaction between the Web client and the server
1.2. HTTP message
The interaction between the client and the server uses two types of messages: request and response ).
The HTTP request format is:
Figure 3. http request format
The HTTP Response format is:
Figure 4. Http response format
From the above, we can see that the HTTP request and Response Message Header both contain a variable number of fields. With a blank line, all the header fields (headers) and the message body (body) are) separated. A header field consists of a field name and a colon, a space, and a field value. The field name is case-insensitive.
Packet headers can be divided into three types: requests, responses, and descriptions. Some headers (such as date) can be used for both requests and responses. The message header describing the subject can appear in the POST request and all response packets. Shows the HTTP header field:
Figure 5. HTTP header field
1.3. HTTP Request Method
HTTP/1.1 defines eight methods (sometimes called "actions") to indicate the different operation methods of resources specified by request-Uri:
· Options
Returns the HTTP Request Method supported by the server for a specific resource. You can also use the '*' request sent to the Web server to test the server's functionality.
· Head
Request the server for the same response as the GET request, but the response body will not be returned. This method can obtain metadata contained in the Response Message Header without transmitting the entire response content.
· Get
Send a request to a specific resource. Note: The get method should not be used in operations that produce "Side effects", for example, in Web application. One of the reasons is that get may be randomly accessed by web spider.
· Post
Submits data to a specified resource for processing (for example, submitting a form or uploading a file ). Data is contained in the request body. POST requests may result in creation of new resources and/or modification of existing resources.
· Put
Upload the latest content to the specified resource location.
· Delete
The request server deletes the resource identified by request-Uri.
· Trace
The request received by the echo server is mainly used for testing or diagnosis.
· Connect
The HTTP/1.1 protocol is reserved for proxy servers that can change connections to pipelines.
The method name is case sensitive. When the resource for a request does not support the corresponding request method, the server should return status code 405 (method not allowed ); when the server does not recognize or support the corresponding request method, status code 501 (not implemented) should be returned ).
The HTTP server should at least implement the get and head methods. Other methods are optional. In addition to the preceding methods, the specific HTTP server can also extend the custom methods.
Security Methods
Developers should be aware that their software represents a user's interaction on the Internet and should inform users that their ongoing operations may have an unexpected and important impact on themselves or others.
In particular, for the get and head methods, except for obtaining resource information, these requests should not have any other meaning. That is to say, these methods should be considered "safe", that is, the so-called security means that the operation is used to rather than modify information. The client should use other "insecure" methods, such as post, put, and delete, in special ways (usually buttons rather than hyperlinks) so that the customer can be aware of the potential responsibilities (such as capital transactions brought by a button) or the requested operation may be insecure (for example, a file will be uploaded or deleted ).
However, it cannot be assumed that the server will not produce any side effects when processing a GET request. In fact, many dynamic resources use this as their feature. The important difference here is that the user does not request this side effect, so the user should not be responsible for these side effects.
Idempotent Method
If the side effects of several requests are the same as those of a single request without considering problems such as errors or expiration, or there is no side effects at all, these request methods can be considered as "idempotent. The get, head, put, and delete methods both have such idempotence attributes. Likewise, because neither the protocol, options, or trace has any side effects, it is also a idempotence of course.
If the serial results of a request made up of several requests remain unchanged after the serial execution of the request or any or multiple of the requests are repeated, the request is serialized as "idempotent. However, the serial number of requests may be "non-idempotent", even if all the request methods executed in the request serial are idempotent. For example, the serial result of this request depends on a variable that will be modified during the next execution of this serial.
1.4 HTTP response code
The first line of the server program response is the status line. The status line starts with the HTTP version number, followed by three digits to indicate the response code, and finally is a readable response phrase. According to the first rule, the response can be divided into five categories:
Figure 6. Http response code
2. packet capture Analysis
Now we basically know about HTTP. Next I will use Wireshark to capture the HTTP data packets during the interaction between my computer and the blog garden server when I open the blog garden homepage. Make preparations and close some programs that may interfere with our crawling and opening the blog garden. For example, when we enter www.cnblogs.com in the browser and confirm it, we first capture the following package:
Figure 7. Open the package captured in the blog Garden
We can see that we entered www.cnblogs.com in the browser and confirmed that an HTTP request message was sent to the server: Get/HTTP/1.1. According to the HTTP message format described in section 1.2, we know that get corresponds to the request,/Corresponds to the request-line, and HTTP/1.1 versions. In addition to the request line, some header fields are sent, such as Accept, accept-language, User-Agent, accept-encoding, host, and connection. In addition, we can see that their format is: header field name: field value. Note that there is a space behind the colon.
Next, let's take a look at the Response Message of the get/HTTP/1.1 request:
Figure 8. Response Message of the get/HTTP/1.1 request
The status line of the Response Message is HTTP/1.1 200 OK. HTTP/1.1 corresponds to the version number, 200 corresponds to the response-code, and OK corresponds to the response-phrase. Besides the status line, some header fields are returned, such as cache-control, content-type, content-encoding, expires, last-modified, vary, and server. (We can see that the blog uses iis7.0)
The above is a get packet. Now let's take a look at a post packet. The classification information on the left is returned by the POST request when the homepage is opened.
Figure 9. Post Data Packets
We can see that post/WS/publicuserservice. asmx/getlogininfo HTTP/1.1. Except for changing get to post, other information is similar. Below we can enlarge the sending header field:
Figure 10. header field of post/WS/publicuserservice. asmx/getlogininfo HTTP/1.1
Note: I will not explain some of the header fields involved in this section here. I think here we should have a deeper understanding of HTTP.
3. Differences between post and get
8 methods are introduced in section 1.3. Get and post are the most basic and common methods. The differences between get and post methods in Form submission are summarized as follows:
· Get is to get data from the server, and post is to send data to the server.
· Get is to add the parameter data queue to the URL referred to by the Action attribute of the submission form. The values correspond to each field in the form one by one and can be seen in the URL. Post uses the http post mechanism to place fields in the form and their content in the HTML header and send them to the URL address referred to by the Action attribute. You cannot see this process.
· For the get method, the server uses request. querystring to obtain the value of the variable. For the POST method, the server uses request. Form to obtain the submitted data.
· The size of data transmitted by get is small and cannot exceed 2 kb (this is mainly because the URL length is limited ). The amount of data transmitted by post is large, which is generally not restricted by default. However, theoretically, the limit depends on the server's processing capability.
· Get is less secure and post is more secure. Because get data is stored in the request URL during transmission, many existing servers, proxy servers, or user proxies record the request URL to the log file, and put it somewhere, so that some private information may be seen by a third party. In addition, you can directly view the submitted data in the browser. Some internal messages are displayed in front of the user. All post operations are invisible to users.
If method is not specified during form submission, the default value is GET request (. NET is post by default). The data submitted in form will be appended to the URL? Separated from the URL. The letter and number characters are sent as they are, but spaces are converted to "+". Other symbols are converted to % xx, XX represents the ASCII (or ISO Latin-1) value in hexadecimal notation. The data to be submitted for the GET request is placed in the HTTP Request Header, while the data to be submitted by post is placed in the object data. The data to be submitted by the get method can contain up to 2048 bytes, post does not have this restriction. The parameters passed by post are in the doc, that is, the text transmitted by the HTTP protocol. When accepted, the parameter section is parsed. Obtain parameters. Generally, it is better to use post. The data submitted by post is implicit. Get is passed in the URL to pass some data that does not need to be kept confidential. Get is passed through parameters in the URL, and post is not.
Note: I checked the information of my predecessors on the Internet about the difference between post and get. Since I can't find the source and there are posts everywhere, I will not post the relevant website here, baidu or Google.
4. Use an example to describe how to use post, get, and other operations in C #.
Before introducing an instance, we should first introduce httpwebrequest and httpwebresponse. in C #, we use these two classes to implement sending HTTP messages to the server and receiving HTTP responses from the server.
4.1. httpwebrequest
Before designing an implementation instance, we should first introduce the httpwebrequest class-provide specific HTTP implementations of the webrequest class, And the httpwebrequest class provides support for the attributes and methods defined in the webrequest, it also supports additional attributes and methods that allow users to directly interact with servers using HTTP.
Do not use httpwebrequest constructor. Use the system. net. webrequest. Create method to initialize the new httpwebrequest object. If the Uniform Resource Identifier (URI) scheme is http: // or https: //, create returns the httpwebrequest object.
The header field (headers) of the HTTP message, which is expressed as a public attribute in httpwebrequest. The following table lists the HTTP headers set by properties or methods or by the system.
If the local computer configuration specifies to use a proxy, or if the request specifies a proxy, use a proxy to send the request. If no proxy is specified, the request is sent to the server.
The httpwebrequest class mainly includes the following methods for interacting with HTTP servers:
· ABORT: cancels the Internet resource request.
· Addrange: adds a range header to the request.
· Begingetrequeststream: starts an asynchronous request to the stream object used to write data.
· Begingetresponse: starts asynchronous requests for Internet resources.
· Create: initialize a new webrequest. (Inherited from webrequest .)
· Createdefault: initializes a new webrequest instance for the specified URI scheme. (Inherited from webrequest .)
· Createobjref: creates an object that contains all the relevant information required to generate a proxy for communication with a remote object. (Inherited from marshalbyrefobject .)
· Endgetrequeststream: ends an asynchronous request to the stream object used to write data.
· Endgetresponse: ends an asynchronous request to Internet resources.
· Getrequeststream: obtains the stream object used to write request data.
· Getresponse: returns the response from Internet resources.
· Getsystemwebproxy: returns the proxy configured in the Internet Explorer settings of the current simulated user. (Inherited from webrequest .)
· Initializelifetimeservice: gets the lifetime service object that controls the lifetime policy of this instance. (Inherited from marshalbyrefobject .)
· Registerprefix: registers the webrequest child for the specified Uri. (Inherited from webrequest .)
4.2. httpwebresponse
Before designing an implementation instance, we also need to introduce the httpwebrequest class-providing specific HTTP implementations of the webresponse class. This class includes support for HTTP-specific usage of attributes and methods in the webresponse class. The httpwebresponse class is used to generate an HTTP independent client application that sends HTTP requests and receives HTTP responses.
Note:
Do not confuse the httpwebresponse and httpresponse classes. The latter is used for ASP. NET applications, and its methods and attributes are made public through the internal response object of ASP. NET.
You must never directly create an instance of the httpwebresponse class. Instead, use the instance returned by calling httpwebrequest. getresponse. You must call the stream. Close method or httpwebresponse. Close method to close the response and release the connection for reuse. You do not have to call stream. Close and httpwebresponse. close at the same time, but this will not cause errors.
Public header information returned from Internet resources is exposed as a property of this class. For a complete list, see the following table. Other headers can be read as name/value pairs from the headers attribute. The following table shows the public HTTP headers that can be used through the attributes of the httpwebresponse class.
Call the getresponsestream method to return the response content from Internet resources in the form of stream.
The httpwebrequest class mainly includes the following methods to interact with the HTTP server: (compared with the httpwebrequest class, there are fewer methods)
· Createobjref: creates an object that contains all the relevant information required to generate a proxy for communication with a remote object. (Inherited from marshalbyrefobject .)
· Getlifetimeservice: retrieves the currently active service object that controls the life cycle policy of this instance. (Inherited from marshalbyrefobject .)
· GetResponseHeader: gets the content of the header returned together with the response.
· Getresponsestream: gets the stream, which is used to read the response body from the server.
· Initializelifetimeservice: gets the lifetime service object that controls the lifetime policy of this instance. (Inherited from marshalbyrefobject .)
4.3 compile a winform program and open the blog homepage (with source code)
Through the introduction in the previous two sections, we have some knowledge about the httpwebrequest and httpwebrequest classes. Now we will use them to compile a small program for practice. The program interface is roughly as follows:
The function is also relatively simple, that is, you can click the "display in webbrowser" button to display the blog garden homepage in the webbrowser control below, click the "HTML source code" button to display the HTML source code of the homepage of the blog.
First, we will introduce how to implement it. By clicking the "HTML source code" button, a dialog box will pop up to display the HTML source code of the blog homepage. The core code is as follows:
By clicking the "HTML source code" button, a dialog box is displayed, showing the HTML source code private string getcnblogs () of the blog homepage ()
{
String html = string. empty;
Httpwebrequest cnbogs = (httpwebrequest) system. net. webrequest. Create (txturl. Text. tostring ());
Cnbogs. accept = "image/JPEG, application/X-MS-application, image/GIF, application/XAML + XML, image/pjpeg, application/X-MS-xbap, application/X-Shockwave-flash, application/vnd. MS-Excel, application/vnd. MS-PowerPoint, application/MSWord, application/qvod, application/qvod ,*/*";
Cnbogs. useragent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; slcc2 ;. net CLR 2.0.50727 ;. net CLR 3.5.30729 ;. net CLR 3.0.30729; maln; CBA; infopath.2 ;. net4.0c ;. net4.0e; Media Center PC 6.0; Tablet PC 2.0; asktb5.6 )";
Cnbogs. method = "get ";
Httpwebresponse cnblogsrespone = (httpwebresponse) cnbogs. getresponse ();
If (cnblogsrespone! = NULL & cnblogsrespone. statuscode = httpstatuscode. OK)
{
Using (streamreader sr = new streamreader (cnblogsrespone. getresponsestream ()))
{
Html = Sr. readtoend ();
}
}
Return HTML;
}
Private void btngethtml_click (Object sender, eventargs E)
{
MessageBox. Show (getcnblogs ());
}
In fact, in this process, we enter the blog garden website in a browser to open the website with the same effect. However, here we implement it through the objects of the httpwebrequest class and httpwebrequest class.
However, by clicking the "display in webbrowser" button, the function of displaying the homepage of the blog garden is similar in the webbrowser control below, it is only displayed in the webbrowser control. Here I encapsulate some common http-related operations into a namespace helper for future use, which is essentially the same as above. Click to download the source code of the entire project.
My source code is still relatively simple, but I simply implemented the interaction between the httpwebrequest class and the HTTP server. More comprehensive functions are coming soon.
Note: For URL length restrictions, the URL of IE can contain a maximum of 2083 characters (half width), while the get can contain a maximum of 2048 characters. However, RFC 2616, Hypertext Transfer Protocol -- HTTP/1.1 does not limit the maximum URL length.
Reference: Write thisArticleI have read many articles and I have a deep impression on them.
· Wikipedia (HTTP), http://zh.wikipedia.org/zh-cn/HTTP
· Msdn (httpwebrequest), http://msdn.microsoft.com/zh-cn/library/8y7x3zz2%28v=VS.80%29.aspx
· Msdn (httpwebresponse), http://msdn.microsoft.com/zh-cn/library/system.net.httpwebresponse%28VS.80%29.aspx
· TCP/IP protocol details 3
Author: Wu Qin
Source: http://www.cnblogs.com/skynet/
This article was published based on the signature 2.5 mainland China license agreement. You are welcome to reprint, interpret, or use it for commercial purposes. However, you must keep the signature Wu Qin (including the link) of this Article ).