A complete browser request process
When we enter www.linux178.com in the address bar of the browser, and then go to enter, enter this moment to see what happened to the page?
The entire process is as follows:
- Domain Name Resolution--
- Initiates a 3-time handshake of TCP--
- Initiates an HTTP request after a TCP connection is established--
- The server responds to HTTP requests and the browser gets HTML code--
- The browser parses the HTML code and requests the resources in the HTML code (such as JS, CSS, pictures, etc.)--
- The browser renders the page rendering to the user
Here's an analysis of the process above, and we'll take the Chrome browser as an example:
Domain Name resolution
First, the Chrome browser willResolve www.linux178.com IP address for this domain name。 How to resolve to the corresponding IP address?
The Chrome browser will firstSearch the DNS cache of the browser itself(The cache time is relatively short, about 1 minutes, and can only hold 1000 caches).
If the browser's own cache is not found, the system's DNS cache is viewed, and if found and not expired, stop the search parsing to this end.
If the machine does not find the DNS cache, then the browser initiates a DNS system call that initiates a domain name resolution request to the locally configured preferred DNS server (through the UDP protocol to the DNS port 53, which is a recursive request, That is, the operator's DNS server must provide us with the IP address of the domain name, the operator's DNS server first to find its own cache, find the corresponding entry, and did not expire, the resolution is successful. If the corresponding entry is not found, then there is a carrier's DNS for our browser to initiate an iterative DNS resolution request, it is to find the root domain of the DNS IP address (this DNS server is built in 13 root domain DNS IP address), find the root domain of the DNS address, will make a request to it (ask www.linux178.com the IP address of this domain name AH?) ), root domain found this is a domain name of a top-level domain COM domain, so tell the carrier's DNS I do not know the IP address of this domain name, but I know the IP address of the COM domain, you go to find it, so the operator's DNS to get the IP address of the COM domain, Another request to the IP address of the COM domain (what is the IP address of this domain name www.linux178.com?), COM domain This server tells the operator of the DNS I do not know www.linux178.com the IP address of this domain name, but I know linux178.com this domain DNS address, you go to find it, so the operator's DNS and to linux178.com the DNS address of this domain name (this is generally By a domain name registrar, such as WAN Network, new network, etc.) to initiate the request (please www.linux178.com the IP address of this domain name is how much?) ), This time linux178.com domain DNS Server A check, eh, really in my place, so the results of the found sent to the operator's DNS server, this time the operator's DNS server got www.linux178.com the domain name corresponding IP address, and returned to the Windows system kernel, The kernel also returns the result to the browser, finally the browser to get the www.linux178.com corresponding IP address, the action of one step.
3-time handshake to initiate TCP
After receiving the IP address of the domain name, user-agent (typically the browser) initiates a TCP connection request to the server's Web program (commonly known as httpd,nginx, etc.) 80 ports on a random port (1024< Port < 65535). This connection request (the original HTTP request passes through the layer layer of the TCP/IP4 layer model) arrives at the server side (this intermediate through various routing devices, except inside the LAN), enters to the network card, then enters into the kernel TCP/IP protocol stack (used to identify the connection request, unpack the packet, a layer of peel off), It is also possible to pass the filtering of the NetFilter firewall (which is the kernel module) and finally arrive at the Web program (Nginx for example) and finally establish a TCP/IP connection.
Why does the HTTP protocol be implemented on TCP?
TCP is an end-to-end, reliable connection-oriented protocol, so HTTP is based on the Transport Layer TCP protocol without worrying about the various problems of data transmission .
Initiating an HTTP request after a TCP connection is established
After TCP3 the handshake, the browser initiates an HTTP request, uses the HTTP method of the GET method, the requested URL is/, the protocol is http/1.0
The following is the detailed contents of package number 12th:
3.png
The above message is an HTTP request message.
So what is the format of the HTTP request message and the response message?
HTTP request message from request line, request header (header),
The blank line and request data are composed of 4 parts , and the general format of the request message is given.
HTTP request Header Request line
The request line consists of 3 fields of the Request Method field, the URL field, and the HTTP protocol version field, separated by a space. For example, get/index.html http/1.1.
The HTTP protocol request method has get, POST, HEAD, PUT, DELETE, OPTIONS, TRACE, CONNECT. The most common get methods and post methods are described here.
- Get: Use the Get method when the client wants to read the document from the server. The Get method requires the server to place the URL-positioned resource in the data portion of the response message, which is sent back to the client. When using the Get method, the request parameter and the corresponding value are appended to the URL, using a question mark ("?" ) represents the end of the URL and the start of the request parameter, which is limited by the length of the pass parameter. For example,/index.jsp?id=100&op=bind.
- Post: You can use the Post method when the client provides more information to the server. The Post method encapsulates the request parameters in the HTTP request data, appears as a name/value, and can transmit large amounts of data. Request header request header consists of keyword/value pairs, one pair per line, keywords and values separated by the English colon ":". The request header notifies the server that there is information about the client request, and the typical request headers are:
User-agent: The type of browser that generated the request.
Accept: A list of content types that the client can identify.
Host: The hostname of the request, which allows multiple domain names to be located in the same IP address as the virtual host. Empty line after the last request header is a blank line that sends a carriage return and a newline character, notifying the server that the following no longer has a request header. Request data request data is not used in the Get method, but is used in the Post method. The Post method is useful for situations where a customer needs to fill out a form. The most commonly used request headers associated with request data are Content-type and content-length.
So what are the request methods in the start line?
GET: Complete Request for a resource (common)
HEAD: Request response header only
POST: Submit form (common)
PUT: Upload
Delete: Remove
What is a URL, URI, URN?
URI Uniform Resource Identifier Uniform Resource Identifier
The URL Uniform Resource Locator Uniform Resource Locator format is as follows: Scheme://[username:[email protected]]host:port/path/to/sourcehttp:// Www.magedu.com/downloads/nginx-1.5.tar.gzURN Uniform Resource Name the Uniform Resource name URL and urn all belong to the URI in order to conveniently put the URL and the URI temporarily refers to a thing
What kinds of protocols are requested?
There are several types of the following:
Http/0.9:statelesshttp/1.0:mime, keep-alive (keep connected), cache http/1.1: More request method, finer cache control, persistent connection (persistent connection) more commonly used
Here is the header of the HTTP request message from Chrome
4.png
which
Accept is to tell the server side, I accept those MIME type accept-encoding this appears to accept the compression method of the file Accept-lanague tell the server to send which language Connection Tells the server to support the Keep-alive feature cookie each request carries a cookie to facilitate server-side identification of whether the same client host is used to identify the virtual host on the requesting server, such as nginx can define a number of virtual hosts It is used to identify the virtual host to access. User-agent User Agent, the general situation is the browser, there are other types, such as: wget curl search engine spider and other conditions request header: If-modified-since is the browser to the server to ask a resource file if ever changed, then send me again, This ensures that when the server-side resource file is updated, the browser goes to the request again instead of using the file security request header in the cache: Authorization: Authentication information provided to the server by the client;
Server-side response HTTP request, browser gets HTML code
After the server-side Web program receives the HTTP request, it begins processing the request and returns it to the browser HTML file after processing.
5.png
Package 32nd is the server's return to the client HTTP response package (the MIME type of the $ OK response is text/html), which represents the successful response of the client-initiated HTTP request. 200 represents the status code of the response success, and there are other status codes as follows:
1XX: Informational Status Code 100, 101
2XX: Success Status Code 200:ok
3xx: Redirect Status code
301: Permanent redirect, the value of the location response header is still the current URL, so it is a hidden redirect;
302: Temporary Redirect, explicit redirect, location response header value for new URL
304:not Modified unmodified, such as the local cache resource file and the comparison on the server, the discovery is not modified, the server returned a 304 status code, tell the browser, you do not have to request the resource, directly using local resources.
4XX: Client Error status code
404:not Found The requested URL resource does not exist
5XX: Server-side Error status code
500:internal Server error Server internal errors
502:bad Gateway in front of the proxy server does not contact the backend server when it appears
504:gateway Timeout This is the agent can contact the backend server, but the backend server did not respond to the proxy server within the specified time
The browser parses the HTML code and requests the resources in the HTML code
When the browser gets the index.html file, it begins parsing the HTML code, and when it encounters static resources such as Js/css/image, it goes to the server to request the download (using multi-threaded download, the number of threads per browser is different), this time using the Keep-alive feature , an HTTP connection is established and multiple resources can be requested.
When a browser requests a static resource (without expiring), it initiates an HTTP request to the server (asking whether the resource has been modified since the last modification time), and if the server side returns a 304 status code (which tells the browser that the server side has not been modified), Then the browser will directly read the local cache file for that resource.
10.png Browser renders a page rendering to the user
Finally, the browser makes use of its internal working mechanism, renders the requested static resource and HTML code, renders it to the user after rendering.
A complete browser request process (RPM)