HTTP is an object-oriented protocol belonging to the application layer, which is suitable for distributed hypermedia information System because of its simple and fast way. It was proposed in 1990, after several years of use and development, has been continuously improved and expanded. Currently used in the WWW is the sixth edition of Http/1.0, http/1.1 standardization work is in progress, and Http-ng (Next Generation of HTTP) has been proposed. (Agreement, Global Positioning!) ) The core--http Protocol of WWW
As we all know, the basic protocol of the Internet is TCP/IP protocol, the current widely used FTP, Archie Gopher, etc. are based on the TCP/IP Protocol Application layer protocol, different protocols corresponding to different applications. The main protocol used by the WWW server is the HTTP protocol, the hyper-stylistic transfer protocol. Because the HTTP protocol supports services that are not limited to WWW, it can be other services, and thereforeThe HTTP protocol allows users to access different services under a unified interface, such as FTP, Archie, SMTP, NNTP, and so on, with different protocols .。 In addition, the HTTP protocol can also be used for name servers and distributed object management.
2.1 HTTP Protocol Introduction HTTP is an object-oriented protocol belonging to the application layer, which is suitable for distributed hypermedia information System because of its simple and fast way. It was proposed in 1990, after several years of use and development, has been continuously improved and expanded. Currently used in the WWW is the sixth edition of Http/1.0, http/1.1 standardization work is in progress, and Http-ng (Next Generation of HTTP) has been proposed. The main features of the HTTP protocol can be summarized as follows: 1. Support client/server mode. 2. Simple and fast: When a customer requests a service from the server, it simply transmits the request method and path. The request method commonly has, POST. Each method specifies a different type of contact between the customer and the server.because the HTTP protocol is simple, the HTTP server's program size is small, so the communication speed is fast. 3. Flexible: HTTP allows the transfer of any type of data object. The type being transmitted is marked by Content-type. 4. No connection: The meaning of no connection is to limit the processing of only one request per connection. When the server finishes processing the customer's request and receives the customer's answer, the connection is disconnected. In this way, the transmission time can be saved. 5. Stateless: The HTTP protocol is a stateless protocol. Stateless means that the protocol has no memory capacity for transactional processing. A lack of state means that if the previous information is required for subsequent processing, it must be re-routed, which may cause the amount of data to be transferred per connection to increase. On the other hand, it responds faster when the server does not need the previous information.
Several important concepts of 2.2 HTTP protocol 1. Connection (Connection): The actual circulation of a transport layer, which is built between two applications that communicate with one another. 2. Message: The basic unit of HTTP communication, including a structured eight-tuple sequence and transmission via a connection. 3. Request: A request from the client to the server includes the method that is applied to the resource, the identifier of the resource, and the version number of the Protocol 4. Response (Response): A message returned from the server includes the version number of the HTTP protocol, the status of the request (for example, "succeeded" or "not Found"), and the MIME type of the document. 5. Resource (Resource): A network data object or service that is identified by a URI. 6. Entity: A special representation of a data resource or a reflection from a service resource, which may be surrounded by a request or response message. An entity includes entity header information and the entity's own content. 7. Client: An application that establishes a connection for the purpose of sending a request. 8. User agent: Initializes a requested client. They are browsers, editors, or other user tools. 9. Server: An application that accepts a connection and returns information to the request. 10. Source Server (Origin server): is a server on which a given resource can reside or be created. 11. Proxy: An intermediary program that can act as a server or as a client to establish requests for other clients. Requests are either internally or passed to other servers through possible translations. An agent must interpret and overwrite it if possible before sending the request information. Proxies are often used as portals through the firewall's client side, and proxies can be used as a help app to handle requests that are not completed by the user agent through the protocol. 12. Gateway: A server that acts as an intermediary for other servers. Unlike the proxy, the gateway accepts the request as if it were the source server for the requested resource, and the requesting client is unaware that it is dealing with the gateway. Gateways are often used as server-side portals through firewalls, and gateways can be used as a protocol translator to access resources stored in non-HTTP systems. 13. Channel (tunnel): is a broker that acts as a two connection relay. Once activated, the channel is considered not to be an HTTP communication, although the channel may be initialized by an HTTP request. The channel disappears when both ends of the relayed connection are closed. A channel is often used when a portal must exist or the intermediary (intermediary) cannot interpret the relay's traffic. 14. Cache: Local storage of the reaction information.
2.3 How the HTTP protocol works The HTTP protocol is based on the request/response paradigm. After a client establishes a connection with the server, it sends a request to the server in the form of a request,the Uniform Resource Identifier, the protocol version number, and the MIME information behind it include the request modifier, client information, and possible content. After the server receives the request, it gives the corresponding response information in the formatA status line includes the protocol version number of the information, a successful or incorrect code, followed by MIME information including server information, entity information, and possible content。 Many HTTP traffic is initialized by a user agent and includes a request to request resources on the source server. The simplest scenario could be a separate connection between the user agent (UA) and the source server (O) (see Figure 2-1).
When one or more mediations appear in the request/response chain, the situation becomes more complex. The mediation consists of three types: proxy, gateway, and channel (tunnel). An agent accepts requests based on the absolute format of the URI, rewrites all or part of the message, and sends the formatted request to the server through the URI's identity. The gateway is a receiving agent that acts as the upper layer of some other servers and, if necessary, translates the request to the underlying server protocol. A channel acts as a relay point between two connections that do not change the message. The channel is often used when the communication needs to pass through an intermediary (for example, a firewall, etc.) or if the content of the message is not recognized by the intermediary. Figure 2-2 Figure 2-2 above shows that there are three mediations (a, B, and C) between the user agent (UA) and the source server (O). A request or response message through the entire chain must pass through four connection segments. This distinction is important because some of the HTTP communication choices may be applied to the nearest connection, the neighbor without the channel, to the end of the chain, or to all connections that are applied along the chain. Although figure 2-2 is linear, each participant may engage in multiple, concurrent communication. For example, B may receive a request from many clients without passing a, and/or send the request to a without C, at the same time it may also process a request. Any aggregation against non-as-a-channel may enable an internal cache for processing requests. The effect of caching is that the request/response chain is shortened, and the condition is that one of the participants along the chain has a cached response acting on that request. Describes the result chain, which is conditional on a request not being cached by UA or a, and B has a cached copy of a pre-response through C from O. Figure 2-3 On the Internet, HTTP traffic typically occurs on top of a TCP/IP connection. The default port is TCP 80, but the other ports are also available. However, this does not imply that the HTTP protocol can be completed on top of other protocols on the Internet or other networks. HTTP is only indicative of a reliable transmission. The above is a brief introduction of the HTTP protocol macro operation, the following describes the HTTP protocol internal operation process. First, the information exchange process of the client/server mode based on the HTTP protocol is briefly introduced, and 2-4 shows that it is divided into four processes, which establish the connection, send the request information, send the response information, and close the connection. Figure 2-4 In www, "Customer" and "server" are a relative concept that exists only during a particular connection, that is, a customer in one connection may be a server in another connection. When the WWW server is running, it is listening on the TCP80 port (the default port of www), waiting for the connection to appear. Below, we discuss the implementation of the information exchange in the client/server mode under the HTTP protocol. 1. The establishment of a connection connection is achieved through the application of a socket socket. The client opens a socket and constrains it to a port, which, if successful, is the equivalent of creating a virtual file. You can then write data on the virtual file and send it out through the network. 2. Sending the request After a connection is opened, the client sends the request message to the server's dwell port to complete the request action. The format of the http/1.0 request message is: Request message = Request Line (General Information | request Header | entity header) crlf[entity content] Request line = Method Request URL http version number CRLF Method =get| head| Post| extension methods U R l= protocol name + host name + directory and file name The methods in the request line describe the actions that should be performed in the specified resource, and the commonly used methods are, and post. The result of a different request object corresponding to get is different, and the corresponding relationship is as follows: Result of Object get Contents of the File file Program execution results of the program Database query Query Results head--requires the server to look up the meta information of an object, not the object itself. post--transmits data from the client to the server, and the Post method is used when the server and CGI are required to do further processing. Post is primarily used to send the contents of a form in HTML text to be processed by the CGI program. An example of a request is: GET http://networking.zju.edu.cn/zju/index.htm http/1.0 Header information is also called meta-information, that is, information, the use of meta-information can be used to achieve conditional requests or responses. The request header-tells the server how to interpret the request, mainly including the types of data that the user can accept, the compression method, and the language. Entity Header--Entity information type, length, compression method, last modification time, data expiration, etc. Entity--The request or response object itself. 3. Send a response The server sends a response message to the client after it has finished processing the client's request. The response message format for http/1.0 is as follows: Response message = Status line (General Information Header | response header | entity header) crlf(entity content) Status line =http version number status code reason narration Status codes represent response types 1XX reserved 2XX indicates that the request was successfully received by 3XX request further refinement of requests for completion of customer requests 4XX Customer Error 5XX Server Error The response header information includes the service program name, notifies the client that the requested URL requires authentication, and when the requested resource is available. 4. Close the connection Both client and server can end TCP/IP conversations by closing sockets |