What is http:
HTTP (Hypertext Transfer Protocol Hypertext Transfer Protocol) is one of the most widely used network protocols on the Internet. All WWW files must adhere to this standard in order to provide a way to publish and receive HTML pages. HTTP defines how information is formatted, transmitted, and responded to by the server and browser under various commands.
HTTP is the application-layer communication protocol between a client browser or another program and a Web server. The hypertext information is stored on the Web server on the Internet, and the client needs to transmit the hypertext information it wants to access over the HTTP protocol. HTTP contains commands and transmission information that can be used not only for Web access, but also for communication between other Internet/intranet application systems, enabling the integration of hypermedia access for a variety of application resources.
For more information on the HTTP protocol, please refer to RFC2616.
Key features of the HTTP protocol
1. Support client/server mode.
2. Simple and fast: When a customer requests a service from the server, it simply transmits the request method and path. The request method commonly has, POST. Each method specifies a different type of contact between the customer and the server. Because the HTTP protocol is simple, the HTTP server's program size is small, so the communication speed is fast.
3. Flexible: HTTP allows the transfer of any type of data object. The type being transmitted is marked by Content-type.
4. No connection: The meaning of no connection is to limit the processing of only one request per connection. When the server finishes processing the customer's request and receives the customer's answer, the connection is disconnected. In this way, the transmission time can be saved.
5. Stateless: The HTTP protocol is a stateless protocol. Stateless means that the protocol has no memory capacity for transactional processing. A lack of state means that if the previous information is required for subsequent processing, it must be re-routed, which may cause the amount of data to be transferred per connection to increase. On the other hand, it responds faster when the server does not need the previous information.
Technical framework:
HTTP is a standard (TCP) for client and server-side requests and responses. The client is the end user and the server side is the Web site. By using a Web browser, crawler, or other tool, the client initiates an HTTP request to the specified port on the server (the default port is 80). (We call this client) called the user agent. The answering server stores (some) resources, such as HTML files and images. This answering server (we call it) is the source server (Origin server). There may be multiple middle tiers, such as proxies, gateways, or tunnels (tunnels), between the user agent and the source server. Although the TCP/IP protocol is the most popular application on the Internet, the HTTP protocol does not stipulate that it must be used and (based on) the layers it supports. In fact, HTTP can be implemented on any other Internet protocol, or on other networks. HTTP only assumes that (its underlying protocol provides) a reliable transmission, and any protocol that provides such assurances can be used by it.
Typically, a request is initiated by an HTTP client to establish a TCP connection to the server-specified port (by default, port 80). The HTTP server listens on that port for requests sent by the client. Once the request is received, the server (to the client) sends back a status line, such as "http/1.1 OK", and (in response) message, the message body may be the requested file, error message, or some other information. The reason HTTP uses TCP instead of UDP is that a Web page must transmit a lot of data, while the TCP protocol provides transport control, organizes the data sequentially, and corrects errors.
The resources requested through the HTTP or HTTPS protocol are identified by the Uniform Resource Identifier (Uniform Resource Identifiers) (or, more accurately, URLs).
Work Flow:
Since HTTP is a transport-layer-based TCP protocol, the TCP protocol is a connection-oriented, end-to-end protocol. Therefore, before using the HTTP protocol to transmit, first establish a TCP connection, which is why the TCP link process in the talk of the "three-time handshake."
On the Web, the HTTP protocol uses the TCP protocol instead of the UDP protocol because a web page must transmit a lot of data and ensure its integrity. The TCP protocol provides transport control, which organizes the data sequentially and corrects a series of functions incorrectly.
An HTTP operation is called a transaction, and its working process can be divided into four steps:
1, the client and the server need to establish a connection. (such as a hyperlink, HTTP begins.) )
2. After the connection is established, the request is sent.
3. After the server receives the request, responds to its response information.
4. The information returned by the client receiving server is displayed on the user's display by the browser, and then the client disconnects from the server.
Establishing a connection is actually based on a TCP connection. The schematic core work process (i.e. eliminating the connection process) is as follows:
About the HTTP protocol
The HTTP protocol uses the request/response model. The client sends a request to the server that contains the requested method, URL, protocol version, and a mime-like message structure that contains the request modifier, customer information, and content. The server responds with a status line that includes the version of the message protocol, success or error encoding plus the server information, entity meta information, and possible entity content.
Typically HTTP messages include client-to-server request messages and server-to-client response messages. These two types of messages consist of a starting line, one or more header fields, a blank line that indicates the end of the head field, and an optional message body. The header fields of HTTP include the general header, the request header, the response header, and the four parts of the entity header. Each header field consists of a domain name, a colon (:), and a domain value of three parts. Domain names are case-insensitive, you can add any number of whitespace before the domain value, and the header field can be expanded to multiple lines, at the beginning of each line, with at least one space or tab.
General header fields:
The generic header domain contains header domains that both request and response messages support, and the generic header domain contains Cache-control, Connection, Date, Pragma, transfer-encoding, Upgrade, Via. The expansion of the universal header domain requires both parties to support this extension, and if there is an unsupported universal header domain, it will generally be handled as the entity header domain. The following is a brief introduction to several common header fields used in UPnP messages:
1.cache-control header Field
CACHE-CONTROL Specifies the caching mechanism that requests and responses follow.
2.Date header Field
The Date header field represents the time the message was sent, and the time description format was defined by RFC822.
3.Pragma header Field
The pragma header domain is used to contain implementation-specific instructions, most commonly pragma:no-cache.
Request message
The first behavior of the request message is in the following format:
Methodsprequest-urisphttp-versioncrlfmethod indicates that the field is case-sensitive for the method Request-uri completed, including options,, POST, PUT, DELETE , TRACE.
1.Host header Field
The host header domain specifies the intenet host and port number of the requesting resource, and must represent the location of the originating server or gateway that requested the URL.
2.Referer header Field
The Referer header domain allows the client to specify the source resource address of the request URI, which allows the server to generate a fallback list that can be used to log in, optimize the cache, and so on.
3.Range header Field
The Range header field can request one or more child ranges of an entity.
4.user-agent header Field
The contents of the User-agent header domain contain the user information that made the request.
Response message
The first behavior of the response message is in the following format:
Http-versionspstatus-codespreason-phrasecrlf
Http-version represents the supported HTTP version, for example, http/1.1. Status-code is a result code of three numbers. Reason-phrase provides a simple text description for Status-code.
The first number of Status-code defines the category of the response, and the latter two numbers do not have a role to classify. The first number can take 5 different values:
1XX: Information response class, which indicates receipt of request and continues processing
2XX: Handle the successful response class, indicating that the action was successfully received, understood, and accepted
3XX: Redirect Response class, must accept further processing in order to complete the specified action
4XX: Client error, client request contains syntax error or is not executed correctly
5XX: Server error, servers do not correctly execute a correct request
1.Location Response Head
The location response header is used to redirect the recipient to a new URI address.
2.Server Response Head
The server response header contains software information for the originating server that processed the request.
Entity Information
Both the request message and the response message can contain entity information, which generally consists of entity header fields and entities. The Entity header field contains the original information about the entity, including allow, Content-base, content-encoding, Content-language, Content-length, Content-location, CONTENT-MD5, Content-range, Content-type, Etag, Expires, Last-modified, Extension-header. Extension-header allows clients to define new entity headers, but these domains may not be recognized by the recipient.
1.content-type Solid Head
The Content-type entity header is used to indicate the media type of the entity to the receiver, specify the entity media type that the head method sends to the receiver, or the request media type sent by the Get method
2.content-range Solid Head
The Content-range entity header is used to specify the insertion position of a part of the entire entity, and he also indicates the length of the entire entity.
3.last-modified Solid Head
Last-modified Entity header Specifies the last revision time to save content on the server.
Message format:
HTTP messages consist of requests from the client to the server and responses from the server to the client. The request message format is as follows:
Request line-General Information header-Request header-Entity header-message body
The request line starts with the Method field, followed by the URL field and the HTTP Protocol version field, ending with CRLF. SP is a delimiter. In addition to the last CRLF sequence CF and LF are required, others can be not. For general information headers, the specific contents of the request header and the entity header can be referenced in the relevant files.
As follows:
For a detailed description of the request message:
1. Request Line
Method field + URL + HTTP protocol version
2. General Information Head
Cache-control Header domain: Specifies the caching mechanism that requests and responses follow.
Keep-alive is its connection is continuously valid
3. Request Header
Host Header Field
Referer header domain: Allows the client to specify the resource address of the request URL.
User-agent Header domain: request user information. "You can see the kernel information of some client browsers"
4. Message body
The response message format is as follows:
Status line-General information header-response head-Body header-message body
A status code element consists of 3-bit numbers that indicate whether the request is understood or is satisfied. The cause analysis is a brief description of the status code of the original text, which is used to support automatic operation, and the reason analysis is used by the user. The client does not need to check or display the syntax. For the general information header, the response header and entity header aspects of the specific content can refer to the relevant files.
Request message Related:
Request line-Request method
Get request gets the resource identified by the Request-uri
Post appends new data to the resource identified by Request-uri
HEAD request Gets the response message header for the resource identified by Request-uri
PUT Request server stores a resource and uses Request-uri as its identity
Delete Request server deletes the resource identified by the Request-uri
TRACE requests the server to echo received request information, primarily for testing or diagnostics
CONNECT reserved for future use
Options request the performance of the query server, or query for resource-related choices and requirements
Response message Related:
Response Line-Status code
The HTTP status code is used by the Web server to tell the client what has happened.
The status code, located in the first line of HTTP Response, returns a "three-digit status code" and a "status message." The "three-digit status code" is easy for the program to process, and "status messages" are easier to understand.
The HTTP status code is divided into five categories:
100-199 is used to specify certain actions that the client should be corresponding.
200-299 is used to indicate a successful request.
300-399 is used for files that have been moved and is often included in the locator header information to specify the new address information.
400-499 is used to indicate client-side errors.
500-599 is used to support server errors.
A complete list of HTTP status codes is provided below.
One, the temporary response
100-199 (Temporary response)
A status code that represents a temporary response and requires the requestor to continue the operation.
Status Code |
reason Phrases |
meaning |
100 |
Go on |
Description received the initial part of the request, please continue to send the client |
101 |
Switching protocols |
|
Second, success
200-299 (Success)
Indicates the status code of the request was successfully processed.
Status Code |
reason Phrases |
meaning |
200 |
Success |
Request succeeded |
201 |
has been created |
The entity body portion of a request for creating a server object, such as a put, should contain a variety of URLs that reference the resources that have been created. The server must create a good object before sending this status code |
202 |
Accepted) |
The request has been received, but the server has not performed any action on it. Eventually the request may or may not be executed. In the case of asynchronous operation, there is no more convenient way to send this status code. The purpose of a response that returns a 202 status code is to allow the server to accept requests from other processes, such as a batch-based operation that executes only once a day, without having the client remain connected to the server until the batch operation is complete. |
203 |
Non-authorized information |
The entity header contains information that is not from the source-side server, but from a copy of the resource. |
204 |
No content |
The response message contains several headers and a status line, but no body part of the entity |
205 |
Reset Content |
Responsible for informing the browser to clear all HTML form elements in the current page |
206 |
Part of the content |
The server successfully processed a partial GET request. |
Third, redirect
300-399 (redirect)
Further action is required to complete the request. Typically, these status codes are used for redirection. Google recommends that you use redirects no more than 5 times per request. You can use the Webmaster tools to see if Googlebot is having trouble crawling the redirected pages. The network crawl page under diagnosis lists URLs that Googlebot cannot crawl due to redirection errors.
Status Code |
reason Phrases |
meaning |
300 |
Multiple options |
This status code is returned when the client requests a URL that actually points to multiple resources, such as the English and French versions of an HTML document on the server. This code is returned with a list of options so that the user can select the one that he or she wants to use. |
301 |
Permanently moving |
The requested resource has been permanently moved to the new location and is used when the requested URL has been removed. The location header of the response should contain the URL where the resource is now located |
60W |
Temporary move |
The requested resource is now temporarily responding to requests from different URIs. |
303 |
See other locations |
Informs the client that another URL should be used to obtain the resource. The new URL is located at the location header of the response message. The main purpose is to allow the response of the POST request to direct the client to a resource |
304 |
Not modified |
The server should return this status code if the client sends a conditional GET request and the request has been allowed, and the contents of the document (since the last time it was accessed or based on the requested condition) have not changed. |
305 |
Using proxies |
Used to indicate that a proxy must be used to access the resource, and the location of the proxy is given by the position header. |
307 |
Temporary redirection |
The requested resource is now temporarily responding to requests from different URIs. Because such redirects are temporary, the client should continue to send subsequent requests to the original address. This response is cacheable only if specified in Cache-control or expires. |
Iv. Request Error
400-499 (Request error)
These status codes indicate a possible error in the request and hinder the processing of the server.
Status Code |
reason Phrases |
meaning |
400 |
Error request |
Used to tell the client that it sent an error request |
401 |
Not authorized |
The current request requires user authentication. The response must contain a Www-authenticate information header for the requested resource to ask for user information. The client can repeatedly submit a authorization that contains the appropriate Request for header information. If the current request already contains the Authorization certificate, the 401 response indicates that the certificate has been rejected by the server authentication. |
403 |
Ban |
The server has understood the request, but refuses to execute it. If the server wants to explain the reason for the rejection, you can describe the reason in the body part of the containing entity. But this status code is usually when the server does not want to explain the reason for rejecting Use |
50U |
Not found |
The resource at the specified location could not be found. |
405 |
Method disables |
The request method (GET, POST, HEAD, DELETE, PUT, Trace, and so on) does not apply to the specified resource. |
50W |
Do not accept |
Indicates that the MIME type of the requested resource is inconsistent with the type specified in the Accept header information in the client. |
407 |
Requires proxy authorization |
This status code is similar to 401 (unauthorized), but specifies that the requestor should authorize the use of the proxy. If the server returns this response, it also indicates that the requestor should use the proxy. |
408 |
Request timed out |
Request timed out |
Five, server error
500-599 (server error)
These status codes indicate that an internal error occurred while the server was processing the request. These errors may be the error of the server itself, not the request.
Status Code |
reason Phrases |
meaning |
500 |
Server Internal Error |
This status code is used when the server encounters an error that prevents it from serving the request. This state is often caused by CGI programs (hopefully not!). ) is either not functioning correctly or returns the header information format caused by an incorrect servlet. |
501 |
Not yet implemented |
Client uses a request method that is not implemented by the server |
502 |
Error Gateway |
When the server acts as a gateway or proxy, the server returns an illegal response in order to complete the request to access the next server. |
503 |
Service Not available |
Used to indicate that the server is now unable to service a request. But in the future, the server can provide a Retry-after header information to tell the client when resources are available. |
504 |
Gateway Timeout |
This state is also used to serve as a proxy or gateway server, which indicates that the receiving server has not received a timely response from the remote server. |
505 |
HTTP Version not supported |
The HTTP version indicated in the request is not supported by the server. |
Reference:
Schematic HTTP protocol
HTTP protocol Detailed
HTTP Status Code action
HTTP and Status Code summary