Principle Analysis of HTTP protocol and HTTP protocol

Source: Internet
Author: User
Tags processing text rfc unsupported

Principle Analysis of HTTP protocol and HTTP protocol

Baidu Encyclopedia:

 

 

HyperText Transfer Protocol (HTTP) is the most widely used network Protocol on the Internet. All WWW files must comply with this standard. HTTP was designed to provide a method for publishing and receiving HTML pages. In 1960, American Ted Nelson conceived a method for processing text information through a computer, called hypertext, which became the foundation of the development of the standard architecture of HTTP hypertext Transfer protocol. Ted Nelson organized and coordinated a series of RFC, the famous RFC 2616 defines HTTP 1.1.

HyperText Transfer Protocol (Hyper Text Transfer Protocol) is a Transfer Protocol used to Transfer HyperText from a WWW server to a local browser. It makes the browser more efficient and reduces network transmission. It not only ensures that the computer transfers hypertext documents correctly and quickly, but also determines which part of the transmitted documents and which part of the content is first displayed (such as text before graphics.

HTTP is an application layer protocol consisting of requests and responses. It is a standard client server model. HTTP is a stateless protocol.

 

Technical Architecture HTTP is a standard (TCP) for client and server requests and responses ). The client is an end user and the server is a website. By using a Web browser, Web crawler, or other tools, the client initiates an HTTP request to the specified port on the server (the default port is 80. (We call this client) user agent ). The response server stores (some) resources, such as HTML files and images. This response server is the origin server ). There may be multiple middle layers of http and several other network protocols between the user proxy and the source server, such as proxy, gateway, or tunnels ). Although TCP/IP is the most popular application on the internet, HTTP does not stipulate that it must be used and (based on) the layer it supports. In fact, HTTP can be implemented on any other Internet protocol or on another network. HTTP only assumes that (provided by its lower-layer protocol) reliable transmission, any protocol that can provide such assurance can be used by it. Generally, an HTTP client initiates a request to establish a TCP connection to the specified port on the server (port 80 by default. The HTTP server listens to the requests sent from the client on that port. Once a request is received, the server (to the client) sends a status line, such as "HTTP/1.1 200 OK", and (response) message, the message body may be the requested file, error message, or other information. The reason why HTTP uses TCP instead of UDP is that a webpage must transmit a lot of data, while TCP provides transmission control, organizes data in order, and corrects errors. Resources requested through HTTP or HTTPS are identified by the Uniform Resource Identifier (Uniform Resource Identifiers) (or, more accurately, URLs. HTTP Protocol (HyperText Transfer Protocol) is a transmission Protocol used to transmit HyperText from a WWW server to a local browser. It makes the browser more efficient and reduces network transmission. It not only ensures that the computer transfers hypertext documents correctly and quickly, but also determines which part of the transmitted documents and which part of the content is first displayed (such as text before graphics. HTTP is the application layer communication protocol between the client browser or other programs and the Web server. The Web server on the Internet stores hypertext information. The client needs to transmit the hypertext information to be accessed through HTTP. HTTP contains commands and transmission information, which can be used not only for Web access, but also for communication between other Internet/Intranet application systems, so as to integrate various application resources with hypermedia access. The website address we enter in the address bar of the browser is called URL (Uniform Resource Locator, unified Resource Locator ). Just as each household has a home address, each webpage also has an Internet address. When you enter a URL in the address box of the http function browser or click a hyperlink, the URL determines the address to be browsed. The browser extracts the website code on the Web server through Hypertext Transfer Protocol (HTTP) and translates it into beautiful Web pages. HyperText Transport Protocol is short for HyperText Transfer Protocol. It is used to transmit WWW Data. For more information about HTTP, see rfc2616. The HTTP protocol uses the request/response model. The client sends a request to the server. The request header contains the request method, URL, Protocol version, and message structure similar to MIME containing the request modifier, customer information, and content. The server responds with a status line. The response content includes the version of the Message Protocol, and the server information, entity metadata, and possible entity content are added to the success or error code. Generally, HTTP messages include the request message sent from the client to the server and the response message sent from the server to the client. These two types of messages are composed of one starting line, one or more header fields, one empty line indicating the end Of the header domain, and an optional message body. The HTTP header includes four parts: Common headers, request headers, response headers, and object headers. Each header field consists of a domain name, a colon (:), and a domain value. The domain name is case-insensitive. You can add any number of space characters before the Domain value. The header field can be expanded to multiple rows. At least one space or Tab character is used at the beginning of each line. HTTP Request Response Model

The HTTP protocol always initiates a request from the client, and the server returns the response. See:

This restricts the use of the HTTP protocol and prevents the server from pushing messages to the client when the client does not initiate a request.

HTTP is a stateless protocol. This request on the same client does not correspond to the previous request.

The general header domain includes the header domains supported by both request and response messages. The general header domain includes Cache-Control, Connection, Date, Pragma, Transfer-Encoding, Upgrade, and. The extension of the common header domain requires both parties to support this extension. If a common header domain is not supported, it is generally processed as the object header domain. The following describes several general headers used in UPnP messages: 1. Cache-Control header domain Cache-Control specifies the Cache mechanism followed by requests and responses. Setting Cache-Control in a request message or response message does not modify the Cache processing process of another message. The cache commands in the request include no-cache, no-store, max-age, max-stale, min-fresh, only-if-cached, commands in the Response Message include public, private, no-cache, no-store, no-transform, must-revalidate, proxy-revalidate, and max-age. The meaning of commands in each message is as follows: Public indicates that the response can be cached in any cache area. Private indicates that the whole or part of the response message of a single user cannot be processed by the shared cache. This allows the server to only describe part of the Response Message of the user's http structure. This response message is invalid for requests of other users. No-cache indicates that no-store cannot be cached for request or response messages to prevent unintentional release of important information. Sending a request message does not cache the request and response messages. Max-age indicates that the client can receive responses with a lifetime not greater than the specified time (in seconds. Min-fresh indicates that the client can receive a response whose response time is earlier than the current time plus the specified time. Max-stale indicates that the client can receive response messages beyond the timeout period. If the value of the max-stale message is specified, the client can receive response messages that exceed the timeout period. The HTTP Keep-AliveKeep-Alive function keeps the client-to-server connection valid. When a subsequent request is sent to the server, the Keep-Alive function avoids establishing or re-establishing a connection. Most Web servers on the market, including iPlanet, IIS, and Apache, Support HTTP Keep-Alive. This feature is usually useful for websites that provide static content. However, for websites with heavy load, there is another problem here: although it is advantageous to keep the opened connection for the customer, it also affects the performance, because during the pause process, resources that can be released are still in use. When the Web server and Application Server run on the same machine, the impact of the Keep-Alive function on resource utilization is particularly prominent. The KeepAliveTime value controls the frequency at which TCP/IP attempts to verify that idle connections are in good condition. If there is no activity within this period of time, a signal is sent to maintain the activity. If the network works normally and the receiver is active, it will respond. If you need to be sensitive to the loss of the receiver, in other words, you need to quickly find the loss of the receiver, please consider reducing this value. If idle connections that are not active for a long time appear more frequently, and the number of lost receivers appears less, you may need to increase the value to reduce the overhead. By default, if the idle connection does not have activity within 7200000 milliseconds (2 hours), Windows sends a message to keep activity. Generally, 1800000 MS is the preferred value, so that half of the closed connections will be detected within 30 minutes. The KeepAliveInterval value defines the frequency at which TCP/IP repeatedly sends the active signal if the response to the active message is not received from the receiver. When the number of times that a sustained active signal is sent but no response is received exceeds the value of TcpMaxDataRetransmissions, the connection is discarded. If you want a long response time, you may need to increase the value to reduce the overhead. If you need to reduce the time spent verifying whether the receiver has been lost, consider reducing this value or the TcpMaxDataRetransmissions value. By default, Windows waits for 1000 milliseconds (1 second) before receiving a response that resends a message that maintains the activity ). KeepAliveTime can be set as needed, for example, 10 minutes. Be sure to convert it to MS. XXX indicates the value of the interval. 2. The Date header field indicates the message sending time. The description format of the time is defined by rfc822. For example, Date: Mon, 31Dec200104: 25: 57GMT. The time described in Date indicates the world standard time. You need to know the time zone of the user to convert the local time. 3. The Pragma header domain is used to contain specific instructions. The most common method is Pragma: no-cache. The meaning of HTTP/1.1 is the same as that of Cache-Control: no-cache. The format of the first behavior of the Request message is as follows: MethodSPRequest-URISPHTTP-VersionCRLFMethod indicates the method for completing Request-URI. This field is case sensitive, including OPTIONS, GET, HEAD, POST, PUT, DELETE, and TRACE. Methods GET and HEAD should be supported by all common WEB servers, and implementation of all other methods is optional. The GET method retrieves the information identified by Request-URI. The HEAD method also retrieves the information identified by Request-URI, but does not return the message body during the response. The POST method can request the server to receive entity information contained in the request. It can be used to submit forms and send messages to newsgroups, BBS, contact groups, and databases. SP indicates space. Request-URI follows the URI format. When this field is asterisk (*), the Request is not used for a specific resource address, but for the server itself. HTTP-Version indicates the supported HTTP Version. For example, HTTP/1.1. CRLF indicates a line feed. The request header domain allows the client to send an http message about the request or about the client to the server. The request header field may contain the following fields: Accept, Accept-Charset, Accept-Encoding, Accept-Language, Authorization, From, Host, If-Modified-Since, If-Match, and If-None. -Match, If-Range, If-Range, If-Unmodified-Since, Max-Forwards, Proxy-Authorization, Range, Referer, and User-Agent. Both parties need to support the expansion of the Request Header domain. If an unsupported Request Header domain exists, it is generally processed as an object header domain. Typical request message: Host: download. *******. deAccept: */* Pragma: no-cacheCache-Control: no-cacheUser-Agent: Mozilla/4.04 [en] (Win95; I; Nav) Range: bytes = 554554-the first line in the above example indicates that the file under the specified URL is obtained through the GET method of the HTTP client (which may be a browser or a download program. The brown part indicates the information of the request header field, and the green part indicates the general header part. 1. The Host header field specifies the Intenet Host and port number of the requested resource, which must represent the location of the original server or gateway of the request url. The HTTP/1.1 request must contain the Host Header domain; otherwise, the system returns the status code 400. 2. the Referer header field allows the client to specify the source resource address of the request uri, which allows the server to generate a rollback linked list for login and cache Optimization. He also allows abolished or erroneous connections to be tracked for maintenance purposes. If the requested uri does not have its own uri address, the Referer cannot be sent. If some uri addresses are specified, this address is a relative address. 3. The Range header can request one or more sub-ranges of an object. For example, it indicates the first 500 bytes: bytes = 0-499 indicates the second 500 bytes: bytes = 500-999 indicates the last 500 bytes: bytes =-500 indicates the range after 500 bytes: bytes = 500-the first and last bytes: bytes = 0-0,-1 specify the range: bytes = 500-600,601-999, but the server can ignore this request header, if the unconditional GET contains the Range request header, the response is returned with the status code 206 (PartialContent) instead of 200 (OK ). 4. The content of the User-Agent header domain contains the User information that initiates the request. The following format is used to respond to the first behavior of a message: HTTP-VersionSPStatus-CodeSPReason-PhraseCRLFHTTP-Version indicates the supported HTTP Version, for example, HTTP/1.1. Status-Code is the result Code of three numbers. Reason-Phrase provides a simple text description for Status-Code. Status-Code is mainly used for automatic machine identification, and Reason-Phrase is mainly used to help users understand. The first digit of Status-Code defines the category of the response. The last two digits do not have a category. The first number may have five different values: 1xx: Information Response class, indicating that the request is received and the request is processed 2xx: Successful response class, indicates that the action is successfully received, understood, and accepted. 3xx: redirect response class. To complete the specified action, you must accept 4xx: client error for further processing, the client request contains a syntax error or cannot be correctly executed 5xx: server error. The server cannot correctly execute a correct Request Response Header domain, allowing the server to pass additional information that cannot be placed in the status line, these fields mainly describe the server information and further Request-URI information. The Response Header includes Age, Location, Proxy-Authenticate, Public, Retry-After, Server, Vary, Warning, and WWW-Authenticate. The expansion of the Response Header domain requires both parties to support the communication. If an unsupported Response Header domain exists, it is generally processed as an entity header domain. Typical response message: HTTP/1.0200 OKDate: Mon, 31Dec200104: 25: 57 GMTServer: Apache/1.3.14 (Unix) Content-type: text/htmlLast-modified: Tue, 17Apr200106: 46: 28 GMTEtag: "a030f020ac7c01: 1e9f" Content-length: %25%content-range: bytes55 ******/40279980 the first line of the above example indicates that the HTTP server responds to a GET method. The brown part indicates the information of the response header field, the green part indicates the general header part, and the red part indicates the information of the object header field. 1. Location Response Header Location response header is used to redirect the recipient to a new URI address. 2. Server Response Header the Server response header contains the software information of the original Server that processes the request. This domain can contain multiple product identifiers and comments. Product identifiers are generally sorted by importance. Entity information request messages and response messages can both contain entity information. entity information is generally composed of entity header fields and entities. The object header contains the original information about the object, object headers include Allow, Content-Base, Content-Encoding, Content-Language, Content-Length, Content-Location, Content-MD5, Content-Range, Content-Type, Etag, Expires, Last -Modified and extension-header. Extension-header allows the client to define new object headers, but these fields may not be recognized by the receiver. An object can be an encoded byte stream. Its Encoding method is defined by Content-Encoding or Content-Type. Its Length is defined by Content-Length or Content-Range. 1. content-Type object header is used to indicate the media Type of the object to the receiver, specify the media Type of the object sent by the HEAD Method to the receiver, or request media Type sent by the GET method. the Content-Range object header specifies the insert position of a part of the entire object, which also indicates the length of the entire object. When the server returns a partial response to the customer, it must describe the response coverage and the length of the entire object. The general format is Content-Range: bytes-unitSPfirst-byte-pos-last-byte-pos/entity-le. For example, the format of the first 500 bytes field is Content-Range: bytes0-499/1234 if an http message contains this section (for example, a response to a Range request or an overlapping request to a Range), Content-Range represents the Range of the transfer, content-Length indicates the number of bytes actually transferred. 3. The Last-modified object header specifies the Last revision time of the content saved on the server. For example, transfer the form of the header 500 byte field: Content-Range: bytes0-499/1234 if an http message contains this section (for example, content-Range indicates the transfer Range, and Content-Length indicates the number of bytes actually transmitted. In WWW, "customer" and "server" are relative concepts and only exist in a specific connection period, that is, customers in a connection may act as servers in another connection. The HTTP-based Client/Server mode information exchange process consists of four processes: establishing a connection, sending request information, sending response information, and closing a connection. HTTP is based on the request/response paradigm. After a client establishes a connection with the server, it sends a request to the server in the format of Uniform Resource Identifier and Protocol version number, the MIME information is followed by the request modifier, client information, and possible content. After receiving the request, the server sends a response in the format of a Protocol version number containing the information, a successful or wrong code in the status line, MIME information is followed by server information, entity information, and possible content. A Simple Method of http operation is that any server, except HTML files, also has an HTTP resident program to respond to user requests. Your browser is an HTTP client and sends a request to the server. When a starting file is entered in the browser or a hyperlink is clicked, the browser sends an HTTP request to the server, this request is sent to the URL specified by the IP address. The resident program receives the request and sends the requested file back after necessary operations. In this process, the data sent and received on the network has been divided into one or more packages, each of which includes: data to be transmitted; control information, it tells the network how to process data packets. TCP/IP determines the format of each data packet. If you don't tell you in advance, you may not know that the information is divided into many small pieces for transmission and re-combination. Many HTTP communications are initiated by a user proxy and include a request to request resources on the source server. The simplest case may be that a separate connection is established between the user proxy (UA) and the source server (O. When one or more intermediaries appear in the request/response chain, the situation becomes more complex. There are three types of mediation: Proxy, Gateway, and Tunnel ). A proxy accepts the request according to the absolute format of the URI, overrides all or part of the message, and sends the formatted request to the server through the URI identifier. The gateway is a receiving proxy and serves as the upper layer of some other servers. If necessary, you can translate the request to the lower layer server protocol. A channel serves as a relay point between two connections that do not change messages. A channel is often used when communication requires an intermediary (such as a firewall) or an intermediary that cannot identify messages. The HTTP packet consists of the request from the client to the server and the response from the server to the client. The Request Message format is as follows: Request Line-general information header-Request Header-entity header-the request line of the message body starts with the method field, followed by the URL field and HTTP Protocol version field, respectively, and end with CRLF. SP is a separator. In the final CRLF sequence, CF and LF are required. For details about common information headers, request headers and object headers, refer to relevant documents. The Response Message format is as follows: Status line-general information header-Response Header-entity header-the message body status code element consists of three digits, indicating whether the request is understood or satisfied. The Reason Analysis briefly describes the status code of the original text. The status code is used to support automatic operations, and the cause analysis is used for users. The client does not need to check or display the syntax. For details about the common information headers, the response headers and object headers, refer to relevant documents. An HTTP operation is called a transaction. The procedure can be divided into four steps: first, the client and the server need to establish a connection. You only need to click a hyperlink to start HTTP. After a connection is established, the client sends a request to the server in the format of Uniform Resource Identifier (URL), Protocol version number, the MIME information is followed by the request modifier, client information, and possible content. After receiving the request, the server sends a response in the format of a status line, including the Protocol version number of the message, a successful or wrong code, MIME information is followed by server information, entity information, and possible content. The information returned by the client receiving server is displayed on the user's display screen through a browser, and the user's http workflow is disconnected from the server. If an error occurs in one of the preceding steps, the error message is returned to the client and output by the display. For users, these processes are completed by HTTP. Users only need to click and wait for the information to be displayed. Many HTTP communications are initiated by a user proxy and include a request to request resources on the source server. The simplest case may be that a separate connection is established between the user proxy and the server. On the Internet, HTTP Communication usually occurs over TCP/IP connections. The default port is TCP 80, but other ports are also available. However, this does not indicate that the HTTP protocol can be completed over the Internet or other network protocols. HTTP only indicates a reliable transmission. In this process, we can call the seller to tell him what type of products we need, and then tell us what products are available and what products are out of stock. In this case, we call through a telephone line (HTTP is through TCP/IP). Of course, we can also fax as long as there is a fax from the seller. For more information, see: Http://blog.csdn.net/lmh12506/article/details/7794512 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.