HTTP, HTTPS, and HTTPHTTPS
I. HTTP protocol
I recently read some books on network communication and studied HTTP and TCP/IP. With some new gains and understandings, I will summarize them here.
(1) What is HTTP?
HTTP (HyperText Transfer Protocol, HyperText Transfer Protocol) is a communication Protocol. It refers to the regulations or rules that must be followed for communication between two computers in a computer network, it allows the transfer of Hypertext Markup Language (HTML) documents from Web servers to clients. It is the most widely used network protocol on the Internet.
(2) A stateless protocol
HTTP is a protocol that does not save the status, that is, HTTP is a stateless protocol. The HTTP protocol itself does not save the communication status between requests and responses. That is to say, at the HTTP level, the Protocol does not persistently process the sent requests or responses.
With HTTP, a new response is generated whenever a new request is sent. The protocol itself does not retain the information of all previous request or response packets. That is to say, the request cannot be processed according to the previous status.
Advantages of stateless: ①Process a large number of transactions faster to ensure protocol scalability. ② Because you do not need to save the status, this can reduce the consumption of CPU and memory resources on the server.
Although HTTP/1.1 is a stateless protocol, it introducesCookieTechnology. WithCookieYou can use HTTP to manage the status.
The Cookie notifies the client to save the Cookie Based on the header field of the Set-Cookie in the response packet sent from the server. When the next time the client sends a request to the server, the client automatically adds the Cookie value to the request message and sends it out. After receiving the Cookie sent by the client, the server returns to check which client sent the request, compares the record on the server, and finally obtains the previous status information.
(3) HTTP Method
Methods supported by HTTP/1.0 and HTTP/1.1
LINK and UNLINE are discarded by HTTP/1.1 and are no longer supported.
(4) HTTP packets
The information used for HTTP protocol interaction is called HTTP packets. The HTTP packet of the request end (client) is called the request message, and the HTTP packet of the response end (server) is called the Response Message. An HTTP packet is a string text consisting of multiple lines of data.
HTTP packets include the following three parts:
1. packet header
Content and attributes of the request or response to be processed by the client or server. Including: Request lines (including request methods, request Uris, HTTP versions), status lines (including status codes, cause phrases, and HTTP versions indicating the response results ), header field (including headers indicating various conditions and attributes of the request and response ).
2. Empty rows
CR + LF, CR (Crriage Return, carriage Return) and LF (Line Feed, Line Feed ).
3. Message Body
Data to be sent.
Example of request message and Response Message:
(5) HTTP persistent connection 1. persistent connection
In the initial version of the HTTP protocol, each HTTP Communication is interrupted, increasing the overhead of traffic.
To solve the preceding TCP problem, HTTP/1.1 introduces the Persistent connection (HTTP Persistent Connections, also known as HTTP keep-alive or HTTP connection reuse) method.
Persistent connection is characterized by maintaining the TCP connection status as long as any end does not explicitly propose to disconnect the connection.
Advantage: This reduces the additional overhead caused by repeated establishment and disconnection of TCP connections and reduces the load on the server. Reduce the overhead time so that the HTTP request and response can end earlier, so that the display speed of the web page is correspondingly increased.
In HTTP/1.1, all connections are persistent connections by default, but they are not standardized in HTTP/1.0.
2. Pipelines
Persistent connections make it possible for most requests to be sent in a pipeline. Before sending a request, you must wait and receive a response before sending the next request. After the emergence of pipeline technology, you can directly send the next request without waiting for a response.
(6) HTTP Result Status Code
The HTTP Status Code describes the returned request results when the client sends a request to the server. Through the status code, you can know whether the server has properly processed the request or encountered an error.
Each HTTP Response Message is returned with a status code consisting of three digits and a reason phrase, such as 200 OK. The first digit is the response category (status code category), and the last two digits are unclassified.
Category of five status codes:
As long as the definition of the status code category is followed, it is okay to change the overall RFC2616 status code in a timely manner, or create and install a barcode on the server.
Several common status codes:
- 200 OK.The requests sent from the client are processed on the server.
- 204 No Content. This indicates that the request received by the server has been processed successfully, but the response message returned does not include the entity part, and no entity body is allowed to be returned. Generally, it is used when the client only needs to send information to the server, but does not need to send new information to the client.
- 301 Moved Permanently. Permanent redirection. Indicates that the requested resource has been assigned a new URI, and the resource URI should be used later. That is to say, if you have saved the URI corresponding to the resource as a bookmarks, you should re-Save it according to the URI prompted by the Location header field.
- 302 Found. Temporary redirection. Indicates that the requested resource has been assigned a new URI, And you are expected to use the new URI for access (this time.
- 304 Not Modified. This indicates that when the client sends a conditional request, the server allows the request to access the resource, but does not meet the conditions. 304 when the status code is returned, it does not contain any body part of the response. 304 although it is divided into 3XX categories, it is not related to redirection.
- 400 Bad Request. Indicates a syntax error in the message. When an error occurs, you must modify the request content and then send the request again.
- 401 Unauthorized. Indicates the authentication information of the request to pass HTTP authentication (BASIC Authentication, DIGEST authentication. If one request has been made before, the user authentication fails.
- 404 Not Found. The requested resource cannot be found on the server. In addition, it can be used when the server rejects the request and does not want to explain the reason.
- 500 Internal Server Error. Indicates An error occurred when the server executes the request. It may also be a Web application bug or some temporary faults.
- 503 Service Unavailable. This indicates that the server is temporarily overloaded or is being shut down for maintenance and cannot process the request. If you know the time required to cancel the above conditions in advance, it is best to write the Retry-After header field and then return it to the client.
(7) Common Questions
1. Differences between HTTP and TCP/IP
TCP/IP is a transport layer protocol that mainly addresses how data is transmitted over the network, while HTTP is an application layer protocol that mainly addresses how to package data.
In details, we can only use the TCP/IP protocol when transmitting data. However, without the application layer, data content cannot be identified, to make the transmitted data meaningful, you must use the application layer protocol. There are many application layer protocols, such as HTTP, FTP, and TELNET. You can also define the application layer protocol by yourself. The WEB uses HTTP as the application layer protocol to encapsulate HTTP text information, and then uses TCP/IP as the transport layer protocol to send it to the network.
2.URI,URL, URNDifference
URI: Uniform Resorce Identifier, which is a unified resource Identifier.
URL: Uniform Resource Locator, which is a unified Resource Locator.
URN: Uniform Resource Name.
URI identifies an Internet resource using a string table, and a URL indicates the location of the resource (the location on the Internet ). The URL is a subset of the URI.
Three relationships:
If you are a person, we will think of his name and address. A URL is similar to an address, which tells you a way to find a target (in this example, a person is found through a street address ). The preceding definition is also a URI. Relatively, we can regard a person's name as URN;Therefore, you can use URN to uniquely identify an object.. Because there may be the same name (the same surname), it is not very appropriate to say that the name of a person is more accurate. What's more appropriate is the ISBN code of the book and the serial number of the product in the system, even though it doesn't tell you how or where to find the target, but you have enough information to retrieve it.
Ii. HTTPS protocol (1) Why HTTPS is used
The HTTP protocol has been introduced above. Although it is widely used, it also has some shortcomings. List as follows:
- Communication uses plain text (not encrypted), and the content may be eavesdropped.
- If you do not verify the identity of the contact, you may experience disguise.
- Unable to verify the integrity of the message, so it may have been tampered.
These problems not only occur on HTTP, but also occur in other unencrypted protocols.
To solve these problems in a unified manner, encryption processing and authentication mechanisms must be added to HTTP. We call the HTTP with encryption and authentication mechanisms HTTPS (HTTP Secure ).
Simply put, HTTPS = HTTP + encryption + Authentication + Integrity protection.
HTTPS communication is often used on the Web login page and the shopping settlement interface. Http: // is not used for HTTPS communication, but https: // is used instead ://. When a browser accesses a Web site that is valid for HTTPS communication, a lock will appear in the address bar of the browser.
(2) Special HTTP
HTTPS is not a new protocol at the application layer. Only the HTTP Communication Interface is replaced by the SSL (Secure Socker Layer) and Transport Layer Security (Transport Layer Security) protocols.
Generally, HTTP directly communicates with TCP. When SSL is used, it first communicates with SSL, and then with TCP. In short, the so-called HTTPS is actually the HTTP with the SSL protocol shell.
TLS/SSL is a security protocol between TCP and HTTP, which does not affect the original TCP and HTTP protocols, therefore, using HTTPS basically does not require too many modifications to HTTP pages. Not only the HTTP protocol, but other protocols running on the application layer, such as SMTP and Telnet, can be used together with the SSL protocol.
(3) Why not all HTTPS
Since HTTPS is so reliable, why is HTTPS not always used by all Web sites?
This is mainly because of the following reasons:
1. encrypted communication consumes more CPU and memory resources than plain text communication. If each communication is encrypted, a considerable amount of resources will be consumed. When it is evenly distributed to a computer, the number of requests that can be processed will inevitably decrease.
Therefore, if non-sensitive information is used for HTTP Communication, HTTPS encryption is used only when personal sensitive data is included. In particular, when a Web site with a large volume of traffic is encrypted, it does not encrypt all the content, but only encrypts the information that needs to be hidden to save resources.
2. Save the cost of purchasing certificates.
Certificates are essential for HTTPS communication. The certificate used must be purchased from the Certification Authority (CA. The certificate price varies slightly with different certification bodies, ranging from several hundred to several thousand a year, services that are not cost-effective in purchasing certificates and some personal websites, only the HTTP Communication mode may be selected.
3. When HTTPS uses SSL, its processing speed slows down.
There are two slow SSL processes: one is slow communication, and the other is high consumption of CPU, memory, and other resources, resulting in slow processing. Compared with HTTP, network load may be 2 to 100 times slower. In addition to TCP connections, HTTP requests, and responses, SSL communication is also required. Therefore, the overall traffic volume will inevitably increase. SSL must be encrypted. Encryption and decryption operations must be performed on both the server and client. Therefore, compared with HTTP, server and client hardware resources are consumed more, resulting in increased load. Of course, you can improve this problem by using the SSL accelerator hardware.
This article is collected from graphic HTTP