First, the HTTP protocol
Recently read some of the network communication books, studied the HTTP and TCP/IP, with some new harvest and understanding, here to do a summary and summarize.
(1) What is the HTTP protocol
HTTP (hypertext Transfer Protocol, Hypertext Transfer Protocol) is a communication protocol that refers to the rules or rules that communication between two computers in a computer network must comply with, allowing Hypertext Markup Language (HTML) Documents are transferred from the Web server to the client, which is one of the most widely used network protocols on the Internet.
(2) A stateless protocol
The HTTP protocol is a protocol that does not save state, that is, HTTP is a stateless protocol. The HTTP protocol itself does not save the communication state between the request and the response. That is, at the HTTP level, the protocol does not persist for requests or responses sent over.
With the HTTP protocol, whenever a new request is sent, a corresponding new response is generated. The protocol itself does not retain information on all previous requests or response messages. In other words, this request cannot be processed according to the previous state.
stateless Advantage: ① handles a large number of transactions faster, ensuring the scalability of the Protocol. ② because you do not have to save the state, this reduces the CPU and memory resources consumed by the server.
Although http/1.1 is a stateless protocol, Cookie technology is introduced in order to achieve the desired hold-state function. With a Cookie that communicates with the HTTP protocol, you can manage the state.
The cookie notifies the client to save the cookie based on the header field information from the Set-cookie in the response message class sent from the server side. When the next client sends a request to the server, the client automatically adds a cookie value to the request message and sends it out. After the server receives the cookie sent by the client, it goes back to check the request from which client, then compares the records on the server and finally gets the information of the previous state.
(3) HTTP method
Methods supported by http/1.0 and http/1.1
Link and unline have been http/1.1 abandoned, no longer support.
(4) HTTP protocol messages
The information used for HTTP protocol interaction is called an HTTP message. The HTTP message of the request side (client) is called the request message, and the HTTP message on the response side (server side) is called the response message. The HTTP message itself is a string literal consisting of multiple rows of data.
The HTTP message consists of the following three parts:
1. Message header
The content and properties of the request or response that the client or server side needs to process. Includes: Request line (contains method for request, request Uri,http version), status line (contains status code indicating response result, reason phrase, HTTP version), header field (contains various headers indicating the various conditions and attributes of the request and response).
2. Blank Line
CR+LF,CR (Crriage return, carriage return) and LF (line Feed, newline character).
3. Message body
The data that should be sent.
Sample diagram of request message and response message:
(5) HTTP Persistent connection 1. Persistent connections
In the initial version of the HTTP protocol, the TCP connection is disconnected once per HTTP communication, increasing the overhead of the traffic.
To address the above TCP problem, http/1.1 introduced a durable connection (HTTP persistent Connections, also known as HTTP keep-alive or HTTP connection reuse) method.
The feature of persistent connections is that the TCP connection state is maintained as long as the disconnect is not explicitly made at either end.
Advantage: Reduces the additional overhead caused by duplicate build and disconnect of TCP connections, and reduces server-side load. That part of the time to reduce the overhead, so that the HTTP request and response can end earlier, so that the display speed of the Web page is increased accordingly.
In http/1.1, all connections are persistent by default, but are not standardized within http/1.0.
2. Pipeline
Persistent connections make it possible for most requests to be routed in a pipelined manner. Before sending the request, you wait and receive a response before sending the next request. After pipeline technology appears, you can send the next request without waiting for a response.
(6) Status code for HTTP results
The duty of the HTTP status code is to describe the result of the request that is returned when the client sends a request to the server. Through the status code, the user can know whether the server is handling the request properly or an error has occurred.
Each HTTP response message is returned with a status code that consists of a three-digit number and a reason phrase, such as a $ OK. The first digit of the number is the response category (Status code category), and the latter two bits are not categorized.
5 Types of status codes:
Just follow the definition of the status Code category, change the status code of the RFC2616 general definition in time, or the server side can create the barcode on its own.
A few common status codes:
- OK. indicates that a request from the client is handled properly on the server side.
- 204 No Content. Indicates that the request received by the server has been processed successfully, but does not contain the body part of the entity in the returned response message, nor does it allow the body of any entity to be returned. Typically, you only need to redirect the client to the server to send information, and the client does not need to send new information content in the case of the use.
- 301 Moved permanently. Permanent redirection. Indicates that the requested resource has been assigned a new URI and should later use the URI that the resource now refers to. That is, if the URI of the resource is already saved as a bookmark, it should be re-saved by the URI of the Location header field prompt.
- 302 Found. Temporary redirection. Indicates that the requested resource has been assigned a new URI and that the user (this time) will be able to access it using the new URI.
- 304 Not Modified. Indicates that when a client sends a request with a condition, the server side allows the request to access the resource but does not meet the condition. 304 When the status code returns, it does not contain any body parts of the response. 304 Although it is divided into the 3XX category, it is actually not related to redirection.
- Request. Indicates that there is a syntax error in the message. When the error occurs, you need to modify the requested content and send the request again.
- 401 Unauthorized. Indicates that the request sent requires authentication information via HTTP authentication (Basic authentication, Digest authentication). If 1 requests have been made before, the user authentication fails.
- 404 Not Found. Indicates that the requested resource could not be found on the server. In addition, it can be used when the server denies the request and does not want to justify it.
- $ Internal Server Error. Indicates that a server-side error occurred while executing the request. It is also possible that a bug or some temporary failure exists in the Web application.
- 503 Service unavailable. Indicates that the server is temporarily overloaded or is under maintenance and cannot process requests now. It is best to write the Retry-after header field and return it to the client if you know beforehand how long it takes to release the above situation.
(7) Common questions
1. The difference between HTTP and TCP/IP
TCP/IP protocol is the Transport Layer protocol, which mainly solves how data is transmitted in the network, and HTTP is the application layer protocol, which mainly solves how to wrap the data.
In detail, we can only use the TCP/IP protocol when transmitting data, but in that case, if there is no application layer, it will not be able to identify the data content, if you want to make the transferred data meaningful, you must use the Application layer protocol, the application layer protocol, such as HTTP, FTP, Telnet, etc. , or you can define the application layer protocol yourself. The web uses the HTTP protocol as an application-layer protocol to encapsulate HTTP text information and then send it to the network using TCP/IP as the Transport layer protocol.
2. URI,URL, URN Difference
Uri:uniform Resorce Identifier, the Uniform Resource identifier.
Url:uniform Resource Locator, Uniform Resource Locator.
Urn:uniform Resource name, Uniform resource names.
The URI identifies an Internet resource with a string table, and the URL represents the location of the resource (where the Internet is located). The visible URL is a subset of the URI.
Diagram of the three:
If it is a person, we will think of his name and address. A URL is similar to an address, and it tells you a way to look for a goal (in this case, find a person by street address). You know, the above definition is also a URI. In contrast, we can think of a person's name as a urn, so a urn can be used to uniquely identify an entity . Because there may be a case of the same name (same surname), it is not quite appropriate to say more accurately the example of a person's name. What's more appropriate is the book's ISBN code and the serial number of the product within the system, although it doesn't tell you how or where to find the target, but you have enough information to retrieve it.
Second, HTTPS protocol (1) Why to use HTTPS
The HTTP protocol is described above, although the HTTP protocol is common, but it is also somewhat inadequate. Listed below:
- The communication uses plaintext (not encrypted) and the content may be bugged.
- The identity of the communicating party is not verified, so it is possible to encounter a disguise.
- The integrity of the message cannot be verified, so it may have been tampered with.
These problems occur not only on HTTP, but also in other unencrypted protocols.
In order to solve these problems uniformly, we need to join the encryption processing and authentication mechanism on HTTP. We refer to HTTP as HTTPS (HTTP Secure) for adding encryption and authentication mechanisms.
Simply put, actually HTTPS = HTTP + encryption + authentication + integrity protection.
HTTPS communication is often used on the Web login page and the shopping checkout interface. When using HTTPS communication, the http://is no longer used instead of https://. When a browser accesses a valid Web site for HTTPS communication, a locked tag appears in the address bar of the browser.
(2) Special HTTP
HTTPS is not a new protocol for the application layer. Just the HTTP Communication interface section is replaced with the SSL (Secure socker layer) and TLS (Transport layer Security) protocols.
Typically, HTTP communicates directly with TCP. When SSL is used, it becomes a first-and-SSL communication, which is then communicated by SSL and TCP. In short, the so-called HTTPS, is actually wearing the SSL protocol layer of the shell HTTP.
Tls/ssl is an HTTP-independent protocol that is a layer of security protocol between TCP and HTTP, does not affect the original TCP protocol and HTTP protocol, so using HTTPS basically does not require much modification of the HTTP page. Not only the HTTP protocol, but other protocols that run on the application layer, such as SMTP and Telnet, can be used with the SSL protocol.
(3) Why not use HTTPS
Since HTTPS is so completely reliable, why don't all Web sites use HTTPS all the time?
This is mainly due to several reasons:
1. Encrypted communication consumes more CPU and memory resources than plain text communications.
If each communication is encrypted, it consumes a considerable amount of resources, and the number of requests that can be processed on a single computer will inevitably decrease. Therefore, if you are non-sensitive or HTTP-only, use HTTPS encrypted communication only when you contain sensitive personal data. In particular, whenever those Web sites with more traffic are encrypted, not all of the content is encrypted, but only for those who need information to be hidden when it is encrypted to conserve resources.
2. Save the cost of purchasing a certificate.
For HTTPS communication, the certificate is essential. The certificate used must be purchased from a certification authority (CA). Certificate prices vary slightly depending on the certification body, hundreds of to thousands of of the year, those who purchase the certificate is not cost-effective services and some personal sites, may only choose to use the HTTP communication method.
3. When HTTPS uses SSL, it slows down processing.
There are two types of slow SSL, one is slow communication, and the other is a large amount of CPU and memory resources, resulting in slower processing speed. The Network load may be 2 to 100 times times slower than HTTP. In addition to TCP connections, sending HTTP requests and responses, SSL communication must also occur, so the overall throughput is inevitably increased. SSL must be processed with encryption. Both the server and the client need to perform cryptographic and decryption operations. As a result, the server and client hardware resources are consumed more than HTTP, resulting in increased load. Of course, this can be improved by using the hardware of the SSL accelerator.
This article mainly collects and organizes from "the diagram http"
HTTP and HTTPS