HTTP protocol and its request header analysis

Last Update:2014-11-09 Source: Internet

Author: User

Tags valid email address

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

As we all know, the basic protocol of the Internet is TCP/IP protocol, the current widely used FTP, Archie Gopher, etc. are based on the TCP/IP Protocol Application layer protocol, different protocols corresponding to different applications.

The main protocol used by the WWW server is the HTTP protocol, the hyper-stylistic transfer protocol. Because the HTTP protocol supports services that are not limited to WWW and other services, the HTTP protocol allows users to access different services under a unified interface, such as FTP, Archie, SMTP, NNTP, and so on, with different protocols. In addition, the HTTP protocol can also be used for name servers and distributed object management.

The earlier version of HTTP, http/0.9, is suitable for simple and fast protocols for a variety of data information, but it is far from meeting the needs of a growing variety of applications. However, http/0.9 is typically stateless as an HTTP protocol: each transaction is handled independently, and when a transaction begins to establish a connection between the client and the server, the connection is freed when the transaction ends. The http/0.9 contains the message structure of the simple-request&simple-responsed. However, the customer cannot use content negotiation, so the server cannot return the media type of the entity.

In 1982, Tim Berners-lee proposed http/1.0, and in the subsequent enrichment and development, http/1.0 became the most important transaction-oriented application layer protocol. This protocol establishes and disconnects a connection for each request/response. Its characteristics are simple, easy to manage, so it meets the needs of everyone, has been widely used. The disadvantage is that the following problems still occur: Slow response to user requests, severe network congestion, security, and so on.

1997 formed http/1.1, that is, now commonly used protocols, in the continuous connection operation mechanism to implement the flow mode, that is, the client needs to make multiple requests to the same server, in fact, most of the pages are now multi-component (such as multiple pictures), can be used to speed up the pipeline, Pipelining is the process of making multiple requests in succession and waiting for them to be sent until they are ready for response. This greatly saves the waiting time for the response to individual requests, allowing us to get a faster view.

In addition, the http/1.1 server-side processing requests in the order received, which guarantees the correctness of the transmission. Of course, the server side in the event of a connection interruption, will automatically retransmit the request to ensure the integrity of the data.

http/1.1 also provides mechanisms for identity authentication, state management, and cache caching. Here, I would like to mention in particular about the http/1.1 cache caching mechanism in the http/1.0 of the shortcomings of the improvement, it is strictly comprehensive, can reduce the time delay, but also save bandwidth. http/1.1 adopts the content negotiation mechanism to select the most suitable content representation of the user.

2.1 HTTP Protocol Introduction

HTTP is an object-oriented protocol belonging to the application layer, which is suitable for distributed hypermedia information System because of its simple and fast way. It was proposed in 1990, after several years of use and development, has been continuously improved and expanded. Currently used in the WWW is the sixth edition of Http/1.0, http/1.1 standardization work is in progress, and Http-ng (Next Generation of HTTP) has been proposed.

The main features of the HTTP protocol can be summarized as follows:

1. Support client/server mode.
2. Simple and fast: When a customer requests a service from the server, it simply transmits the request method and path. The request method commonly has, POST. Each method specifies a different type of contact between the customer and the server.
Because the HTTP protocol is simple, the HTTP server's program size is small, so the communication speed is fast.
3. Flexible: HTTP allows the transfer of any type of data object. The type being transmitted is marked by Content-type.
4. No connection: The meaning of no connection is to limit the processing of only one request per connection. When the server finishes processing the customer's request and receives the customer's answer, the connection is disconnected. In this way, the transmission time can be saved.
5. Stateless: The HTTP protocol is a stateless protocol. Stateless means that the protocol has no memory capacity for transactional processing. A lack of state means that if the previous information is required for subsequent processing, it must be re-routed, which may cause the amount of data to be transferred per connection to increase. On the other hand, it responds faster when the server does not need the previous information.

Several important concepts of 2.2 HTTP protocol

1. Connection (Connection): The actual circulation of a transport layer, which is built between two applications that communicate with one another.
2. Message: The basic unit of HTTP communication, including a structured eight-tuple sequence and transmission via a connection.
3. Request: A request from the client to the server includes the method that is applied to the resource, the identifier of the resource, and the version number of the Protocol
4. Response (Response): A message returned from the server includes the version number of the HTTP protocol, the status of the request (for example, "succeeded" or "not Found"), and the MIME type of the document.
5. Resource (Resource): A network data object or service that is identified by a URI.
6. Entity: A special representation of a data resource or a reflection from a service resource, which may be surrounded by a request or response message. An entity includes entity header information and the entity's own content.
7. Client: An application that establishes a connection for the purpose of sending a request.
8. User agent: Initializes a requested client. They are browsers, editors, or other user tools.
9. Server: An application that accepts a connection and returns information to the request.
10. Source Server (Origin server): is a server on which a given resource can reside or be created.
11. Proxy: An intermediary program that can act as a server or as a client to establish requests for other clients. Requests are either internally or passed to other servers through possible translations. An agent must interpret and overwrite it if possible before sending the request information.
Proxies are often used as portals through the firewall's client side, and proxies can be used as a help app to handle requests that are not completed by the user agent through the protocol.
12. Gateway: A server that acts as an intermediary for other servers. Unlike the proxy, the gateway accepts the request as if it were the source server for the requested resource, and the requesting client is unaware that it is dealing with the gateway.
Gateways are often used as server-side portals through firewalls, and gateways can be used as a protocol translator to access resources stored in non-HTTP systems.
13. Channel (tunnel): is a broker that acts as a two connection relay. Once activated, the channel is considered not to be an HTTP communication, although the channel may be initialized by an HTTP request. The channel disappears when both ends of the relayed connection are closed. A channel is often used when a portal must exist or the intermediary (intermediary) cannot interpret the relay's traffic.
14. Cache: Local storage of the reaction information.

2.3 How the HTTP protocol works

The HTTP protocol is based on the request/response paradigm. After a client establishes a connection to the server, it sends a request to the server in the form of a Uniform Resource Identifier, protocol version number, followed by MIME information including the request modifier, client information, and possible content. After the server receives the request, it gives the corresponding response information in the form of a status line that includes the protocol version number of the information, a successful or incorrect code, followed by MIME information including server information, entity information, and possible content.

Many HTTP traffic is initialized by a user agent and includes a request to request resources on the source server. The simplest scenario could be a separate connection between the user agent (UA) and the source server (O) (see Figure 2-1).

Figure 2-1

When one or more mediations appear in the request/response chain, the situation becomes more complex. The mediation consists of three types: proxy, gateway, and channel (tunnel). An agent accepts a request based on the absolute format of the URI (Uniform Resource Identifier), rewrites all or part of the message, and sends the formatted request to the server through the URI's identity. The gateway is a receiving agent that acts as the upper layer of some other servers and, if necessary, translates the request to the underlying server protocol. A channel acts as a relay point between two connections that do not change the message. The channel is often used when the communication needs to pass through an intermediary (for example, a firewall, etc.) or if the content of the message is not recognized by the intermediary.

Figure 2-2

Figure 2-2 above shows that there are three mediations (a, B, and C) between the user agent (UA) and the source server (O). A request or response message through the entire chain must pass through four connection segments. This distinction is important because some of the HTTP communication choices may be applied to the nearest connection, the neighbor without the channel, to the end of the chain, or to all connections that are applied along the chain. Although figure 2-2 is linear, each participant may engage in multiple, concurrent communication. For example, B may receive a request from many clients without passing a, and/or send the request to a without C, at the same time it may also process a request.

Any aggregation against non-as-a-channel may enable an internal cache for processing requests. The effect of caching is that the request/response chain is shortened, and the condition is that one of the participants along the chain has a cached response acting on that request. Describes the result chain, which is conditional on a request not being cached by UA or a, and B has a cached copy of a pre-response through C from O.

Figure 2-3

On the Internet, HTTP traffic typically occurs on top of a TCP/IP connection. The default port is TCP 80, but the other ports are also available. However, this does not imply that the HTTP protocol can be completed on top of other protocols on the Internet or other networks. HTTP is only indicative of a reliable transmission.

The above is a brief introduction of the HTTP protocol macro operation, the following describes the HTTP protocol internal operation process.

First, the information exchange process of the client/server mode based on the HTTP protocol is briefly introduced, and 2-4 shows that it is divided into four processes, which establish the connection, send the request information, send the response information, and close the connection.

Figure 2-4

In www, "Customer" and "server" are a relative concept that exists only during a particular connection, that is, a customer in one connection may be a server in another connection. When the WWW server is running, it is listening on the TCP80 port (the default port of www), waiting for the connection to appear.

Below, we discuss the implementation of the information exchange in the client/server mode under the HTTP protocol.

1. Establish a connection
The connection is established by applying for a socket socket. The client opens a socket and constrains it to a port, which, if successful, is the equivalent of creating a virtual file. You can then write data on the virtual file and send it out through the network.

2. Sending the request
After a connection is opened, the client sends the request message to the server's dwell port to complete the request action.
The format of the http/1.0 request message is:
Request message = Request Line (General Information | request Header | entity header) crlf[entity content]
Request line = Method Request URL http version number CRLF
Method =get| head| Post| extension methods
Url= protocol name + host name + directory and file name
The methods in the request line describe the actions that should be performed in the specified resource, and the commonly used methods are, and post. The result of a different request object corresponding to get is different, and the corresponding relationship is as follows:
Result of Object get
Contents of the File file
Program execution results of the program
Database query Query Results
head--requires the server to look up the meta information of an object, not the object itself.
post--transmits data from the client to the server, and the Post method is used when the server and CGI are required to do further processing. Post is primarily used to send the contents of a form in HTML text to be processed by the CGI program.
An example of a request is:
GET http://networking.zju.edu.cn/zju/index.htm http/1.0
Header information is also called meta-information, that is, information, the use of meta-information can be used to achieve conditional requests or responses.
The request header-tells the server how to interpret the request, mainly including the types of data that the user can accept, the compression method, and the language.
Entity Header--Entity information type, length, compression method, last modification time, data expiration, etc.
Entity--The request or response object itself.

3. Send a response
The server sends a response message to the client after it has finished processing the client's request.
The response message format for http/1.0 is as follows:
Response message = Status line (General Information Header | response header | entity header) crlf(entity content)
Status line =http version number status code reason narration
Status codes represent response types
1XX reserved
2XX indicates that the request was successfully received by
3XX request further refinement of requests for completion of customer requests
4XX Customer Error
5XX Server Error
The response header information includes the service program name, notifies the client that the requested URL requires authentication, and when the requested resource is available.

4. Close the connection
Both client and server can end TCP/IP conversations by closing sockets

Two HTTP protocol Request Header format analysis

The request header of the HTTP protocol is divided into 10 parts.

1.From:
In the form of Internet mail, this field gives the name of the user being requested. This field may be used for logging in and an unsafe form of access protection. The explanation of this field is that the request being performed by a given user is being executed, and the user accepts the response of the method being executed.
The Internet mail address in this field is not necessarily a response to the requesting host. For example, when a request is passing through a gateway, the address of the publisher that begins should be used.
If you can, the email address should be a valid email address regardless of whether it is actually an Internet mail address.

2.Accept:
This field contains a list of delimited request scenarios that will be accepted in the response to this request. This field may be wrapped in a few lines according to RCFC822, and this field is not just one occurrence but also accepted, as if all the portals have been planted in one domain. The pattern for each entry in the list is as follows:
<field> = Accept: <entry> *[, <entry>]
<entry> = <content type> *[; <param>]
<param> = <attr> = <float>
<attr> = Q/MXS/MXB
<float> = <ansi-c floating point text represntation>
Note that in the above syntax, semicolons have precedence over commas, which is intended to meet the multi-purpose forgotten message expansion protocol.
Note that no accept field appears, so the unformatted body and HTML body are assumed to be accepted.
Example
Accept:text/plain, text/html
Accept:text/x-dvi; q=.8; mxb=100000; mxt=5.0, Text/x-c
In order to save time and also allow customers to accept content type that they may not be aware of an asterisk may be used in the following place, either the second half of the Content-type value, or both halves. This is only applied to the accept, and is not for content-type field of course.
Example
Accept: * *, q=0.1
accept:audio/*, q=0.2
Accept:audio/basic q=1
The above example can explain this: if you have basic audio, then send it, otherwise send me some other sound, or can not do that, then just give me what you get.
Type parameters
Parameters in (content type) are particularly important for describing resolutions, color depth, and so on. They will allow a customer to specify the resolution of its device in the Accept field. This may allow the server to save significantly by reducing the resultion of a picture during transmission. And make a more suitable user time of the black and white image is selected instead of giving the customer a color picture to convert to monochrome.
These parameters is specified when types is registered. @@ TBS. Sugestions include the following. References to existing improved abbreviations for these:
The following parameters are specified in detail when the type is registered.
Dpi
Dots per inch:pixels per inch [cm?!]
Pxmax
Maximum width in pixels (image or video)
Pymax
Maximum height in pixels
Bits
Bits per sample (sound) or pixels (graphics)
Mchrome
Grayscale or black and white (no value)
Sps
Samples (sound) or frames (video) per second
Length
Total size of object in bytes [bits?]

3.accept-encoding:
Same as accept, but only lists the content-encoding that are acceptable in the response types
<field> = accept-encoding: <entry> *[, <entry>]
<entry> = <content transfer encoding> *[, <param>]
Example:
accept-encoding:x-compress; X-zip

4. Accept-language:
Same as accept but lists the better language values in the response. A response is not illegal in a language that is not described in detail.

5. User-agent:
If present, this line gives the software program that was used by the original user. This is for the sake of statistics and protocol violations tracking. The first white space delineated the word must be the SOFTWARE PRODUCT name has an optional slash and version description. Other parts of the product that form the user agent may be arranged as separate words.
<field> = user-agent: <product>+
<product> = <word> [/<version>]
<version> = <word>
Example:
user-agent:lii-cello/1.0 libwww/2.5

6. Referer:
This optional header field allows the customer to specify, for the benefit of the server, the address of the document or the elements in the document, the URI is obtained in the request by the address of the document or the element in the document.
This allows a server to generate backward links to documents that allow bad links to be tracked for maintenance.
If a part of the URI is given then it should be parsed to the URI of the corresponding request object.
Example:
Referer:http://www.w3.org/hypertext/datasources/overview.html

7. Authorization:
If this line exists, it contains the authorization information. The format is also specified. The format of this field is in an extensible form. The first word is a specification of an authorized system in use.
Basic
Specification for current one implemented by AL SEP 1993.
PGP/PEM encryption (pgp/enhanced Cryptographic e-mail cryptography)
People at NCSA is designing a PGP/PEM based protection system.
User/password scheme
Authorization:user Fred:mypassword
The design name is "user". The second word is a user name, with an optional password separated by a colon, just like the URL syntax for FTP. Without a password this provides a very low-level security guarantee, with the password, which provides a low-level security guarantee as undefined ftp,telnet and so on.
Koreros
Authorization:kerberos Kerberos Authentications Parameters
The acknowledgment parameter format for Kerberos is specified.

8. Chargeto:
If this line exists, it includes the account information for the program of the requested method. Format is TBS
(To is Specified) the format of this field must be in extended mode. The first word begins with a namespaces description. This and extension Urlㄒ Samuel Mow browsing 5 nose regretful is defined. namespaces see is registered with the registration confirmation.
The format of the remainder of this line is a system-related function But it is recommended this includes a maximum cost and a cost unit for the transaction that is confirmed by the user.
If-modified-since:date
This request header is used with the Get method to make it conditional. If the request document is not changed until it is defined, then the document will not be sent, but there will be a not Modified 304 response.
The format of this field is the same as the date.

9.Pragma:
Syntax is the same as a multivalued field in other HTTP, like the Accept field, where the name above is a colon-separated list of entries for him the optional parameter is the Chinese-European zhi-g?
Pragma instructions should be understood by the server, which is relative to it, for example, a proxy server is currently only a Pragma defined: No-cache
When the current proxy should not return a document from the cache, even though it has not yet expired, it should always request the document from the actual server where it exists.
pragma should be implemented through proxies, even though they may have meaning to the agent itself. This is necessary in the event when the request has to pass through a number of agents, and pragma should have all of them valid.

The following is information about using Jetcar to download the Internet vampire in addition

Thu Mar 14 14:36:56 2002 connecting 202.113.29.120 [IP=202.113.29.120:80]
Connecting host, resolving IP address
Thu Mar 14 14:36:57 2002 connected.
Thu Mar 14:36:57 2002 Get/index.dhtml?op=download&ino=2941&type=file http/1.1//request line, which means to get the file in Get mode, And it's the HTTP1.1 version.
Thu Mar 14:36:57 2002 host:202.113.29.120//host Name
Thu Mar 14:36:57 2002 Accept: */*//accept field, accepted data type
Thu Mar 14:36:57 2002 referer:http://202.113.29.120//forwarded from this URL
Thu Mar 14:36:57 2002 user-agent:mozilla/4.0 (compatible; MSIE 5.00; Windows 98)//Client Identity
Thu Mar 14:36:57 2002 Pragma:no-cache//parameters, indicating compatibility with previous servers
Thu Mar 14:36:57 2002 Cache-control:no-cache//Do not use cache
Thu Mar 14:36:57 2002 Connection:close//indicates a non-persistent connection.
The following is the response field
Thu Mar 14:36:58 2002 http/1.1 302 Found
The server uses the http/1.0 protocol, the status value is 200, and the status is OK, indicating that the file can be read
Thu Mar 14:36:58 2002 Date:thu, Mar 2002 06:52:16 gmt//Present time, expressed in GMT
Thu Mar 14:36:58 2002 server:apache/1.3.19 (Unix) php/4.0.4pl1
Server type
Thu Mar 14:36:58 2002 X-POWERED-BY:PHP/4.0.4PL1
Thu Mar 14:36:58 2002 set-cookie:phpsessid=6cf938f3c6ce551971c787ac8b3c0f5b; path=/
Thu Mar 14:36:58 2002 Expires:thu, Nov 1981 08:52:00 gmt//Request document Expiration time
Thu Mar 14:36:58 2002 Cache-control:no-store, No-cache, Must-revalidate, post-check=0, pre-check=0
Thu Mar 14:36:58 2002 Pragma:no-cache
Thu Mar 14:36:58 2002 Content-disposition:inline; Filename=netvampire33.zip
Thu Mar 14:36:58 2002 Location:ftp://202.113.29.120/pub/dos_windows/internet/client/download/net Vampire/3.3/ Netvampire33.zip
Thu Mar 14:36:58 2002 Connection:close
Thu Mar 14:36:58 2002 transfer-encoding:chunked
Thu Mar 14:36:58 2002 content-type:text/html

Note: Various types of errors returned by the server
When the server responds, its status line information is the version number of the HTTP, the status code, and a simple explanation of the status code. 5 Types of status codes are listed in detail now:
① Client Side Error
100 continue
101 Exchange Protocol
② success
OK
201 Created
202 Reception
203 Non-certified information
204 No Content
205 Resetting Content
206 part of the content
③ redirection
300 multi-channel selection
301 Permanent Transfer
302 Temporary transfer
303 See other
304 unmodified (not Modified)
305 using Proxies
④ Client Side Error
400 error requests (Bad request)
401 Not certified
402 Fee Required
403 Prohibition (Forbidden)
404 Not Found (not Found)
405 method does not allow
406 Not Accepted
407 Requires agent authentication
408 Request timed out
409 conflicts
410 failure
411 Required Length
412 Pieces failed
413 Request Entity too large
414 Request URI too long
415 Media type not supported
⑤ Server Error
500 Server Internal Error
501 Not implemented (not implemented)
502 Gateway Failure
504 Gateway Timeout
505 HTTP Version not supported

HTTP protocol and its request header analysis

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More