HTTP protocol in-depth understanding

Source: Internet
Author: User
Tags response code rfc silverlight

HTTP is an abbreviation for the Hyper Text Transfer Protocol (Hypertext Transfer Protocol). Its development is the result of collaboration between the World Wide Web Association (Wide) and the Internet Working Group IETF (Internet Engineering Task Force), which defines the http/1.0 version of RFC 1945, RFC 2616 defines a version of--http 1.1 that is commonly used today. The HTTP protocol is a transfer protocol used to transfer hypertext from a WWW server to a local browser, which can make the browser more efficient and reduce network transmission. It not only ensures that the computer transmits hypertext documents correctly and quickly, but also determines which part of the document is being transmitted, and which content is displayed first (such as text before graphics), and so on. HTTP Features:

1. Support client/server mode. The HTTP protocol is always a client-initiated request and the server sends back a response, which restricts the use of the HTTP protocol and does not enable the server to push the message to the client when the client does not initiate the request.
2. Simple and fast: When a customer requests a service from the server, it simply transmits the request method and path. The request method commonly has, POST. Each method specifies a different type of contact between the customer and the server. Because the HTTP protocol is simple, the HTTP server's program size is small, so the communication speed is fast.
3. Flexible: HTTP allows the transfer of any type of data object. The type being transmitted is marked by Content-type.
4. No connection: The meaning of no connection is to limit the processing of only one request per connection. When the server finishes processing the customer's request and receives the customer's answer, the connection is disconnected. In this way, the transmission time can be saved.
5. stateless : The HTTP protocol is a stateless protocol. Stateless means that the protocol has no memory capacity for transactional processing. A lack of state means that if the previous information is required for subsequent processing, it must be re-routed, which may cause the amount of data to be transferred per connection to increase. On the other hand, it responds faster when the server does not need the previous information.

The HTTP protocol is usually hosted on top of the TCP protocol, sometimes hosted on the TLS or SSL protocol layer, which is what we often call HTTPS. As shown in the following:


The port number for the default HTTP 80,HTTPS is 443.


1, Workflow: (an HTTP operation is called a transaction, its working process can be divided into four steps)

1) First the client and the server need to establish a connection. (TCP Three-time handshake ).

2) After the connection is established, the client sends a request to the server in the form of a Uniform Resource Identifier (URL), protocol version number, followed by MIME information including the request modifier, client information, and possible content.

3) When the server receives the request, it gives the corresponding response information in the form of a status line, including the protocol version number of the information, a successful or incorrect code, followed by MIME information including server information, entity information, and possible content.

4) The information returned by the client receiving server is displayed on the user's display by the browser, and then the client disconnects from the server. If an error occurs in one of the steps above, the information that generates the error is returned to the client, with the display output. For the user, these processes are done by HTTP itself, the user just click with the mouse, waiting for information to display it.

Through packet capture software analysis packets (such as) can clearly see the client browser (IP 192.168.2.33) and the server interaction process:

1) No1: The browser (192.168.2.33) makes a connection request to the server (220.181.50.118). This is the first step of the TCP three handshake, as can be seen at this point, for Syn,seq:x (x=0)

2) No2: the server (220.181.50.118) responds to the request of the browser (192.168.2.33) and asks for confirmation at this time: Syn,ack, at which time Seq:y (Y is 0), ack:x+1 (1). This is the second step of the three-time handshake;

3) No3: The browser (192.168.2.33) responded to the confirmation of the server (220.181.50.118) and the connection was successful. As: ACK, at this time seq:x+1 (for 1), ack:y+1 (for 1). This is the third step of the three-time handshake; 4) No4: The browser (192.168.2.33) issues a page HTTP request;

5) No5: Server (220.181.50.118) confirmation;

6) No6: Server (220.181.50.118) send data;

7) No7: client browser (192.168.2.33) confirmation;

8) No14: Client (192.168.2.33) sends a picture HTTP request;

9) NO15: Server (220.181.50.118) Send status response code ...


2, HTTP request (a request has a request line, a request header, a blank line, request data composed of four parts)


1. Request line: The request line begins with a method symbol, separated by a space, followed by the requested URI and version of the Protocol, and finally with a carriage return, in the following format:get/index.html http/1.1
The HTTP protocol request method has get, POST, HEAD, PUT, DELETE, OPTIONS, TRACE, CONNECT.

1) Get
is the most common way to request a client to read a document from the server, using the Get method. The Get method requires the server to place the URL-positioned resource in the data portion of the response message, which is sent back to the client. When using the Get method, the request parameter and the corresponding value are appended to the URL, using a question mark ("?" ) represents the end of the URL and the start of the request parameter, which is limited by the length of the pass parameter. For example,/index.jsp?id=100&op=bind, so that data passed by get is directly represented in the address, so we can send the result of the request as a link to the friend. To use Google search Domety as an example, the request format is as follows:

get/search?hl=zh-cn&source=hp&q=domety&aq=f&oq= http/1.1  accept:image/gif, Image/x-xbitmap, Image/jpeg, Image/pjpeg, Application/vnd.ms-excel, Application/vnd.ms-powerpoint, Application/msword, application/ X-silverlight, Application/x-shockwave-flash, */*  Referer: <a href= "http://www.google.cn/" >http:// www.google.cn/</a>  ACCEPT-LANGUAGE:ZH-CN  accept-encoding:gzip, deflate  user-agent:mozilla/4.0 (Compatible; MSIE 6.0; Windows NT 5.1; SV1;. NET CLR 2.0.50727; TheWorld)  Host: <a href= "http://www.google.cn" >www.google.cn</a>  connection:keep-alive  cookie:pref=id=80a06da87be9ae3c:u=f7167333e2c3b714:nw=1:tm=1261551909:lm=1261551917:s=ybycq2wpfefs4v9g; Nid=31=ojj8d-iygaetsxlgajmqsjvhcspkvijrb6omjamnrsm8lzhky_ Ymfo2m4qmrkch1g0iqv9u-2hfbw7bufwvh7pgarub0rnhcju37y-fxlrugatx63jlv7cwmd6ub_o_r  
As you can see, requests for Get methods generally do not contain the "Request Content" section, where the request data is expressed in the form of an address in the request line.

2). POST
For cases where the Get method is not appropriate, consider using post, because using the Post method allows the client to provide more information to the server. The Post method encapsulates the request parameter in the HTTP request data, appears as a name/value, and can transmit a large amount of data so that the post does not have a limit on the size of the data being transmitted, and it is not displayed in the URL. Also take the above search Domety as an example, if you use the Post method, the format is as follows:

post/search http/1.1  accept:image/gif, Image/x-xbitmap, Image/jpeg, Image/pjpeg , Application/vnd.ms-excel, Application/vnd.ms-powerpoint, application/msword, Application/x-silverlight, Application/x-shockwave-flash, */*  referer: <a href= "http://www.google.cn/" >http://www.google.cn/</ A> &NBSP;ACCEPT-LANGUAGE:ZH-CN  accept-encoding:gzip, deflate content-length:22 user-agent: mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;. NET CLR 2.0.50727; TheWorld)  host: <a href= "http://www.google.cn" >www.google.cn</a>  connection:keep-alive  Cookie:PREF=ID=80a06da87be9ae3c:U=f7167333e2c3b714:NW=1:TM=1261551909:LM=1261551917:S=ybYcq2wpfefs4V9g;  nid=31=ojj8d-iygaetsxlgajmqsjvhcspkvijrb6omjamnrsm8lzhky_ Ymfo2m4qmrkch1g0iqv9u-2hfbw7bufwvh7pgarub0rnhcju37y-fxlrugatx63jlv7cwmd6ub_o_r &NBSP;HL=ZH-CN&SOURCE=HP &q=domety  
As you can see, the POST request line does not contain a data string, which is stored in the Request Content section and is separated by the "&" symbol between the data.

3). HEAD
Head is like a get, except that the server receives a head request and returns only the response header, without sending the response content. When we only need to look at the state of a page, the use of head is very efficient, because the content of the page is omitted during transmission.

2, request the head: The request header consists of keyword/value pairs, each line of a pair, the keyword and value separated by the English colon ":". The request header notifies the server that there is information about the client request, and the typical request headers are:

    1. User-agent: The type of browser that generated the request.
    2. Accept: A list of content types that the client can identify.
    3. Host: The hostname of the request, allowing multiple domain names to be located in the same IP address, i.e., the virtual host;
    4. Referer header domain: Allows the client to specify the source resource address of the request URI, which allows the server to generate a fallback list that can be used to log in, optimize the cache, and so on;
    5. Connection:keep-alive: Indicates whether a persistent connection is required;
    6. Cookie:cookie information;
3. Blank line: The last request header is followed by a blank line that sends a carriage return and a newline character, notifying the server that the following no longer has a request header.
4. Request data:The request data is not used in the Get method, but is used in the Post method (the Get method data is placed after the URL address)。 The Post method is useful for situations where a customer needs to fill out a form. The most commonly used request headers associated with request data are Content-type and content-length.

3. HTTP response (HTTP response is composed of three parts: status line, message header, corresponding body)

1, the status line format is as follows: Http-version Status-code reason-phrase CRLF
Where http-version represents the version of the server HTTP protocol, Status-code represents the response status code sent back by the server, and Reason-phrase represents a textual description of the status code. A description of the common status code and status is described below.

    • OK: Client request succeeded.
    • Bad Request: Client requests have syntax errors and cannot be understood by the server.
    • 401 Unauthorized: Request is not authorized, this status code must be used with the Www-authenticate header domain.
    • 403 Forbidden: The server receives the request but refuses to provide the service.
    • 404 Not Found: The request resource does not exist, for example: The wrong URL was entered.
    • Internal Server error: Unexpected errors occurred on the server.
    • 503 Server Unavailable: The server is currently unable to process client requests and may return to normal after a period of time, for example: http/1.1 OK (CRLF).
2. Corresponding header: Commonly used response header
    1. The Location response header field is used to redirect the recipient to a new position. Location response header fields are commonly used when changing domain names.
    2. The server Response header field contains the software information that the server uses to process the request. Corresponds to the User-agent request header field. eg:server:apache-coyote/1.1
    3. The www-authenticate response header domain must be included in the 401 (unauthorized) response message, the client receives a 401 response message, and when the authorization header domain is sent to the request server to validate it, the service-side response header contains the header domain.  Eg:www-authenticate:basic realm= "Basic Auth test!" You can see that the server is using a Basic authentication mechanism for the requested resource.
3, the corresponding body: the contents of the resources returned by the server. Instance:
http/1.1 Okdate:sat, Dec 2005 23:59:59 Gmtcontent-type:text/html;charset=iso-8859-1content-length: 122    Wrox homepage   !--body goes here-->  

4. Telnet for HTTP test

5. Comparison of http/1.0 and http/1.1 between http/1.0 and http/1.1

1, http/1.0 each request needs to establish a new TCP connection, the connection can not be reused. http/1.1 A new request can be sent on top of the last TCP connection that was requested, and the connection can be reused. The advantage is to reduce the overhead of repeating the TCP three handshake and increase the efficiency. Note: In the same TCP connection, the new request needs to wait until the last request receives a response before it can be sent.

2, host domain HTTP1.1 in the request message header there is a host domain, HTTP1.0 does not have this domain. It may be HTTP1.0 that when a TCP connection is established, an IP address is specified, and there is only one host on the IP address. Eg:
Get/pub/www/theproject.html http/1.1
Host:www.w3.org

3. Status Response code
The use of State Response Code (CONTINUE) status code allows the client to test the server with the request header before sending the request message body, to see if the server wants to receive the request body. Decide if you want to send the request body again. After the client contains Expect:100-continue server in the request header, the client continues to send the request body if the Continue status code is returned. This is HTTP1.1. In addition, 101, 203, 205 and other sexual status response codes were added in the http/1.1.

4. Request method
HTTP1.1 adds options, PUT, DELETE, TRACE, connect these request methods.

。。。


5, Other: 1, in order to improve the user's performance when using the browser,modern browsers also support concurrent access, while browsing a Web page and creating multiple connections to quickly get multiple icons on a Web page, so that the entire Web page can be transferred more quickly. This continuous connection is provided in HTTP1.1, while the next-generation HTTP protocol: Http-ng adds support for more efficient connections, such as session control, rich content negotiation, and more.


2. Cookies and Session:

Both the cookie and the session are used to save state information, which is a mechanism for preserving the state of the client, and they are all efforts to resolve the problem of HTTP stateless. Sessions can be implemented with cookies, or by the mechanism of URL writeback. A session that is implemented with cookies can be considered a more advanced application of cookies.
1) comparison between the two: the cookie and the session have the following distinct points:

    1. The cookie saves the state on the client, and the session saves the State on the server side;
    2. Cookies are small pieces of text that the server stores on the local machine and are sent to the same server with each request. Cookies were first implemented in RFC2109, and subsequent RFC2965 were enhanced. The Web server sends cookies to the client using HTTP headers, and in the client terminal, the browser parses the cookies and saves them as a local file, which automatically binds any requests from the same server to these cookies. The session is not defined in the HTTP protocol;
    3. Session is for each user, the value of the variable is saved on the server, with a sessionid to distinguish which user session variable, this value is accessed by the user's browser when the server is returned, when the customer disables the cookie, This value may also be set to be returned to the server by get;
    4. As far as security is concerned: when you visit a site that uses a session and create a cookie on your own machine, it is recommended that the session mechanism on the server side be more secure. Because it does not arbitrarily read the information stored by the customer.

2) session mechanism:
The session mechanism is a server-side mechanism that uses a hash-like structure (or perhaps a hash table) to hold information. When a program needs to create a session for a client's request, the server first checks to see if a session ID is included in the client's request-called the session ID. If it contains a session The ID indicates that the session was previously created for this client, and the server retrieves the session using the session ID (if it is not retrieved, it may create a new one) if the client request does not include the session ID. Creates a session for this client and generates a session Id,session ID value associated with this session should be a string that is neither duplicated nor easily found to mimic the pattern, this session The ID will be returned to the client in this response to be saved.

3) How the session is implemented

1. Use cookies to achieve:
The server assigns a unique jsessionid to each session and sends it to the client via a cookie. When the client initiates a new request, it will carry the Jsessionid in the cookie header. This allows the server to find the session corresponding to this client. The process is as follows:

2. Use URL echo to implement:
URL writeback means that the server carries Jsessionid parameters in all links sent to the browser page, so that the client can click on any link to bring the Jsessionid to the server. If you request the resource directly in the browser by entering the URL of the server-side resource, the session is not matched. Tomcat's implementation of the session is the beginning of a simultaneous use of cookie and URL writeback mechanism, if the discovery of the client support cookie, continue to use the cookie, stop using the URL writeback. If a cookie is found to be disabled, URL writeback is always used . When the JSP development process to the session, the link in the page remember to use Response.encodeurl ().

3) Several cases of session failure in the Java EE project
1.Session Timeout: The session expires within the specified time, for example, 30 minutes, if there is no action within 30 minutes, the session will expire, for example, in Web. XML, the following settings:

<session-config>
<session-timeout>30</session-timeout>//units: minutes
</session-config>

2. Use Session.invalidate () to explicitly remove the session.

4) The HTTP extension header associated with the cookie

    1. Cookie: The client returns the cookie set by the server to the server;
    2. Set-cookie: The server sets a Cookie to the client;
    3. Cookie2 (RFC2965)): The client instructs the server to support the version of the cookie;
    4. Set-cookie2 (RFC2965): The server sets a cookie to the client.

5) Flow of cookies
The server sends the contents of the cookie back to the client in the response message with the Set-cookie header, and the client sends the same content in the cookie header to the server in the new request. This allows for the session to persist. The process is as follows:



3, the implementation of the principle of caching
1) What is Web caching
Web caching (cache) is located between the Web server and the client. The cache saves copies of the output as requested, such as HTML pages, pictures, and files, when the next request arrives: if it is the same URL, the cache responds to the access request directly using the replica instead of sending the request to the source server again. The HTTP protocol defines the associated message headers to make the Web cache work as well as possible.

2) Advantages of caching

    1. Reduce latency: Because requests are from the cache server (closer to the client) instead of the source server, this process takes less time and makes the Web server appear to be faster.
    2. Reduced network bandwidth consumption: Reduces client bandwidth consumption when replicas are reused, and customers can save bandwidth costs, increase the need to control bandwidth, and make it easier to manage.

3) Cache-related HTTP extended message headers

    1. Expires: Indicates when the response content expires, Greenwich Mean time GMT
    2. Cache-control: More granular control over the contents of the cache
    3. Last-modified: The last time the resource was modified in the response
    4. ETag: The check value of the resource in the response, which is uniquely identified on the server.
    5. Date: Time of the server
    6. If-modified-since: The time that the client accesses the last modification of the resource, same as last-modified.
    7. If-none-match: The client accesses the test value of the resource, with the ETag.

4) Common process for client cache to take effect
When the server receives the request, it echoes the resource's last-modified and ETag headers in 200OK, the client saves the resource in the cache, and records both properties. When a client needs to send the same request, if-modified-since and If-none-match Two headers are carried in the request. The value of two headers is the value of the last-modified and ETag headers in the response, respectively. The server uses these two headers to determine that the local resource has not changed and that the client does not need to re-download and return a 304 response. The common process is as follows:

5) Web caching mechanism
The purpose of the cache in http/1.1 is to reduce the sending request in many cases, while in many cases it is not necessary to send a full response. The former reduces the number of network loops; HTTP uses an "out-of-date (expiration)" mechanism for this purpose. The latter reduces bandwidth for network applications, and HTTP uses the "Authentication (validation)" mechanism for this purpose. HTTP defines 3 kinds of caching mechanisms:

    1. Freshness: Allows a response message to be unchecked on the source server and can be controlled by the server and the client. For example, the expires response header gives a time when a document is unavailable. The max-age identifier in the Cache-control indicates the maximum time for the cache;
    2. Validation: Used to check whether a cached response is still available. For example, if a response has a last-modified response header, the cache can use If-modified-since to determine if a change has been made in order to determine whether the request is being sent;
    3. Invalidation: There is often a side effect when another request passes through the cache. For example, if a URL is associated with a cached response, but subsequent requests for post, put, and delete are followed, the cache expires.

4. The implementation principle of the breakpoint continuation and multi-threaded download

    • The Get method of the HTTP protocol, which supports requesting only a certain part of a resource;
    • 206 partial content response;
    • Range of resources requested by range;
    • Content-range the resource range of the response;
    • When a connection is disconnected, the client requests only the portion of the resource that is not downloaded, instead of requesting the entire resource again to implement the breakpoint continuation.

Chunked Request Resource instance: eg1:range:bytes=306302-: Request This resource from 306,302 bytes to the end of the section; Eg2:content-range:bytes 306302-604047/ 604048: The response indicates that the 第306302-604047 byte of the resource is carried, the resource has a total of 604,048 bytes, and the client implements concurrent chunked downloads of a resource by concurrently requesting different fragments of the same resource. So as to achieve the purpose of fast download. At present, the popular flashget and thunder are basically the principle.

The principle of multi-threaded download:

    1. The download tool opens multiple threads that make HTTP requests;
    2. Only one part of the resource file is requested for each HTTP request: Content-range:bytes 20000-40000/47000;
    3. Merges the files downloaded by each thread.

Note: The above content is organized from the network


Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

HTTP protocol in-depth understanding

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.