Introduction to HTTP protocol 1. Using the Google/Firefox browser analysis
In the Web application, the server passes the Web page to the browser, which actually sends the HTML code of the Web page to the browser, which is displayed by the browser. and the transport protocol between the browser and the server is HTTP, so:
HTML is a kind of text used to define Web pages, HTML, you can write Web pages;
HTTP is the protocol that transmits HTML over the network, and is used for communication between the browser and the server.
The Chrome browser provides a complete set of debugging tools that are ideal for web development.
Once you've installed your Chrome browser, open chrome and choose "View", "Developer", "Developer Tools" in the menu to display the developer tools:
Description
- Elements displaying the structure of a Web page
- Network display browser and server communication
We dot the network to make sure the first Little red light is on, and chrome logs all the communication between the browser and the server:
2. Analysis of the HTTP protocol
When we enter www.sina.com in the address bar, the browser will display Sina's homepage. What did the browser do in the process? Through the network records, we can know. In the network, find www.sina.com record, click on the right side will display the request Headers, click on the right side of the view source, we can see the browser sent to Sina server request:
2.1 Browser Requests
Description
The most important first two lines are analyzed as follows, the first line:
GET / HTTP/1.1
Get represents a read request that will fetch the Web page data from the server, the path to the URL, the URL always starts with a/,/represents the home page, and the last http/1.1 indicates that the HTTP protocol version used is 1.1. The current version of the HTTP protocol is 1.1, but most servers also support version 1.0, the main difference being that version 1.1 allows multiple HTTP requests to reuse a TCP connection to speed up the transfer.
Starting with the second line, each line is similar to XXX:ABCDEFG:
Host: www.sina.com
Indicates that the requested domain name is www.sina.com. If a server has multiple Web sites, the server needs to differentiate which Web site the browser is requesting through host.
2.2 Server Response
Continue to find response Headers, click View Source to display the original response data returned by the server:
The HTTP response is divided into header and body (body is optional), and the most important lines we see in the network are the following:
HTTP/1.1 200 OK
200 indicates a successful response, followed by OK is the description.
If the return is not 200, then there are often other features, such as
- Failed response has 404 Not Found: page does not exist
- Internal Server Error: Internal error in server
- ... Wait a minute...
Content-Type: text/html
Content-type indicates the content of the response, here is the text/html representing the HTML page.
Please note that the browser relies on Content-type to determine whether the content of the response is a webpage or a picture, video or music. The browser does not rely on the URL to determine the content of the response, so even if the URL is http://www.baidu.com/meimei.jpg
, it is not necessarily a picture.
HTTP response body is the HTML source code, we choose "View" in the menu bar, "Developer", "View Web page source" can be directly in the browser to view the HTML source code:
Browser parsing process
When the browser read the HTML source of the Sina homepage, it will parse the HTML, display the page, and then, according to the HTML inside the various links, and then send the HTTP request to Sina server, get the corresponding pictures, videos, Flash, JavaScript scripts, CSS and other resources, Finally, a complete page is displayed. So we can see a lot of extra HTTP requests under the network.
3. Summarize 3.1 HTTP requests
Following Sina's homepage, let's summarize the process of the HTTP request:
3.1.1 Step 1: The browser sends an HTTP request to the server first, and the request includes:
Method: Get or Post,get only request resources, Post will be accompanied by user data;
Path:/full/url/path;
Domain name: specified by the Host header: Host:www.sina.com
and other relevant headers;
If it is post, then the request also includes a body containing the user data
3.1.1 Step 2: The server returns an HTTP response to the browser, and the response includes:
Response code: 200 indicates success, 3xx indicates redirection, 4xx indicates that the client sent a request error, 5xx indicates server-side processing error occurred;
Response type: specified by Content-type;
and other relevant headers;
Usually the server's HTTP response will carry content, that is, a body, containing the content of the response, the HTML source of the Web page is in the body.
3.1.1 Step 3: If the browser still needs to continue to request additional resources from the server, make the HTTP request again, repeat steps 1, 2.
The HTTP protocol used by the web uses a very simple request-response pattern, which greatly simplifies development. When we write a page, we only need to send the HTML in the HTTP request, no need to consider how to include pictures, videos, etc., if the browser needs to request pictures and videos, it will send another HTTP request, so an HTTP request only processes one resource ( This can be understood as a short connection in the TCP protocol, with each link acquiring only one resource, if multiple links need to be established.
The HTTP protocol is also very extensible, although the browser is requesting the http://www.sina.com
homepage, but Sina in HTML can be linked to other server resources, for example
, so that the request pressure spread across the server, and a site can link to other sites, Countless sites link up with each other, forming the world Wide Web, referred to as www.
3.2 http Format
Each HTTP request and response follows the same format, and an HTTP contains both the header and body, where the body is optional.
The HTTP protocol is a text protocol, so its format is very simple.
3.2.1 The format of the HTTP GET request:
GET /path HTTP/1.1 Header1: Value1 Header2: Value2 Header3: Value3
Each header line is one line, and the newline character is \ r \ n.
3.2.2 The format of the HTTP POST request:
POST /path HTTP/1.1 Header1: Value1 Header2: Value2 Header3: Value3 body data goes here...
When encountering two consecutive \ r \ n, the header part ends, and the back data is all body.
3.2.3 The format of the HTTP response:
200 OK Header1: Value1 Header2: Value2 Header3: Value3 body data goes here...
The HTTP response is delimited by \r\n\r\n If it contains a body.
Please note again that the body data type is determined by the Content-type header, if it is a Web page, body is the text, if it is a picture, body is the binary data of the picture.
When there is content-encoding, the body data is compressed, the most common compression method is gzip, so, when you see Content-encoding:gzip, you need to extract the body data first to get real data. The purpose of compression is to reduce body size and speed up network transmission.
Pythonweb Server Programming (i)