"Original" cliché-from the input URL to the page to show exactly what happened

Source: Internet
Author: User
Tags ack error status code browser cache change domain name domain server

The beginning of this article is quite tangled, because the online search "from the input URL to the page show what happened", you can search a lot of information. And interview this question basic is must test, February interview, although know this process what happened, but when the interviewer step by step, a lot of details are not very clear.

Recently just look at the HTTP protocol related things, so want to have a deep summary of this topic, the purpose of this article is to enter the URL after the event to do the summary and extension of knowledge. So the article may be very miscellaneous.

The overall process is probably as follows:

1. Enter the address

When we start to enter the URL in the browser, the browser is actually in the smart match may have URL, he will be from the history, bookmarks and other places, find the string that has been entered may correspond to the URL, and then give a smart hint, so that you can complete the URL address. For Google Chrome's browser, he will even show the page directly from the cache, which means that you haven't pressed enter yet and the page is out.

2, the browser to find the IP address of the domain name

1, the request once initiated, the browser first thing to do is to resolve the domain name, in general, the browser will first look at the local hard disk Hosts file, see if there is no and this domain name corresponding rules, if there is a direct use of the Hosts file inside the IP address.

2, if the local hosts file is not able to find the corresponding IP address, the browser will issue a DNS request to the local DNS server. The local DNS server is usually provided by your network access server, such as China Telecom, Chinese mobile.

3. After the DNS request of the URL you entered arrives at the local DNS server, the local DNS server first queries its cache record, and if it has this record in the cache, it can return the result directly, which is a recursive way of querying. If not, the local DNS server also queries the DNS root server.

4, the root DNS server does not record the specific domain name and IP address of the corresponding relationship, but tells The local DNS server, you can go to the domain server to continue to query, and give the address of the domain server. This process is the iterative process.

5. The local DNS server continues to make a request to the domain server, in this case, the object being requested is a. com domain server: After a request is received by a COM domain server, it does not directly return the correspondence between the domain name and the IP address, but instead tells The local DNS server The address of the resolution server for your domain name.

6, finally, the local DNS server to the domain name of the resolution server to make a request, you can receive a domain name and IP address correspondence, the local DNS server not only to return the IP address to the user's computer, but also to save the corresponding relationship in the cache, in case the next time another user query, you can directly return the results, Speed up network access.

The following diagram is a perfect explanation of this process:

Knowledge Expansion: 1) What is DNS?

  DNS (domain Name System), a distributed database of domain names and IP addresses that are mapped to each other on the Internet, makes it easier for users to access the Internet without remembering the number of IP strings that can be read directly by the machine. The process of obtaining the IP address of the host name through the hostname is called Domain name resolution (or hostname resolution).

In layman's words, we are more accustomed to remembering the name of a website, such as www.baidu.com, rather than remembering its IP address, such as: And the computer is better at remembering the IP address of the website than the link such as www.baidu.com. Because, DNS is equivalent to a phone book, such as you want to find www.baidu.com this domain name, then I turn over my phone book, I know, oh, its telephone (IP) is

2) Two ways of DNS query: Recursive query and iterative query

1. Recursive parsing

When the local DNS server itself cannot answer the client's DNS queries, it needs to query the other DNS servers. There are two ways to do this, as shown in recursive mode. The local DNS server is itself responsible for querying to other DNS servers, usually the root domain server of the domain name, and then the root domain server level down query. The resulting query results are returned to the local DNS server, which is then returned to the client by the local DNS server.

2, iterative parsing

authority dns The server itself cannot answer the client's dns query can also be resolved by iterating over the query, dns server does not own to other dns server to query, but to resolve the domain name of the other dns server IP address returned to client dns program, client dns program to continue to these The DNS server queries until the query results are obtained. In other words, iterative parsing only helps you to find the relevant server, and will not help you to check. For example: baidu.com server IP address in here, you check it yourself, I am relatively busy, can only help you here.

3) How DNS domain name spaces are organized

We have talked about root DNS servers, domain DNS servers, and these are the ways in which DNS domain name spaces are organized. The five categories that describe DNS domain names in their functional namespaces are described in the following table, along with examples of each name type

(Theft map)

4) DNS Load Balancing

When a website has enough users, if each request is on the same machine, the machine may jump off at any time. The workaround is to use DNS load balancing technology, which is the principle of configuring multiple IP addresses for the same host name in the DNS server, and when answering DNS queries, the DNS server will return different parsing results in sequence with the IP address of the host record in the DNS file. Direct client access to different machines, so that different clients can access different servers to achieve load balancing purposes? for example, depending on the amount of load per machine, the distance from the user's location to the machine, and so on.

3. The browser sends an HTTP request to the Web server

After receiving the IP address of the domain name, the browser initiates a TCP connection request with a random port (1024< port <65535) to the server's Web program (common Httpd,nginx, etc.) 80 port This connection request arrives at the server side (this intermediate through various routing devices, except within the LAN), into the network card, and then into the kernel of the TCP/IP protocol stack (used to identify the connection request, unpack the packet, a layer of peel off), It is also possible to pass the filtering of the NetFilter firewall (which is the kernel module) and eventually reach the Web program, eventually establishing a TCP/IP connection.

TCP connections:

After a TCP connection has been established, an HTTP request is initiated. A typical HTTP request header needs to include the method of the request, such as Get or post, not commonly used PUT and DELETE, HEAD, option, and TRACE methods, the general browser can only initiate GET or post Request.

When the client initiates an HTTP request to the server, there is a request for information that contains three parts:

| Request method URI Protocol/version

| Requests header (Request header)

| Request Body:

Here is an example of a complete HTTP request:

get/sample.jsphttp/1.1accept:image/gif.image/jpeg,*/*accept-language:zh-cnconnection: keep-alivehost:localhostuser-agent:mozila/4.0 (compatible; MSIE5.01; Window NT5.0) accept-encoding:gzip,deflate

Note: the last request header is followed by a blank line that sends a carriage return and a newline character, notifying the server that the following no longer has a request header.

(1) The first line of the request is "method URL negotiation/version": Get/sample.jsp http/1.1
(2) Requesting header (Request header)
The request header contains many useful information about the client environment and the request body. For example, the request header can declare the language used by the browser, the length of the request body, and so on.

accept:image/gif.image/jpeg.*/*accept-language:zh-cnconnection:keep-alivehost:localhostuser-agent : mozila/4.0 (compatible:msie5.01:windows NT5.0) accept-encoding:gzip,deflate.

(3) Request body
Between the request header and the request body is a blank line, which is very important, which indicates that the request header has ended, followed by the request body. The request body can contain query string information submitted by the customer:

Knowledge Expansion:1) TCP Three-time handshake

First handshake: Client A will set the flag bit SYN to 1, randomly generate a value of seq=j (J of the value range of =1234567) packet to the server, client a into the syn_sent state, waiting for service side B to confirm;

Second handshake: Server B receives the packet by the flag bit syn=1 know client a request to establish a connection, service side B will flag bit SYN and ACK are set to 1,ack=j+1, randomly generate a value seq=k, and the data packets to client A to confirm the connection request, Server B enters Syn_ RCVD status.

Third handshake: After client A receives the acknowledgment, checks whether the ACK is j+1,ack 1, if correct, sets the flag bit ACK to 1,ack=k+1, and sends the data packets to Server B, and the service side B checks if the ACK is K+1,ack 1, and if correct, the connection is established successfully. Client A and Server B enter the established state, completing the three handshake, and then the data transfer between client A and server B can begin.

2) Why do I need a three-time handshake?

in the fourth edition of the computer network, the purpose of "three handshake" is "to prevent the failure of the connection request packet suddenly transmitted to the server, resulting in an error"

The example in the book is that the "Failed connection request message segment" is generated in a situation where the first connection request message segment of the client is not lost, but it is stuck in a network node for a long time so that it is delayed until a certain time after the connection is released to the server. Originally this is a message segment that has already expired. However, after the server receives this failed connection request message segment, it is mistaken for a new connection request from the client. The client is then sent a confirmation message segment, agreeing to establish a connection.

Assuming that the "three-time handshake" is not used, the new connection is established as soon as the server issues a confirmation. Because the client is now not making a connection request, the server acknowledgement is ignored and data is not sent to the server. But the server thought the new transport connection had been established and waited for the client to send the data. In this way, many of the server's resources are wasted. The use of "three-time handshake" method can prevent the above phenomenon. For example, in that case, the client does not issue confirmation to the server's confirmation. The server knows that the client does not require a connection because it cannot receive a confirmation. ”。 The main purpose is to prevent the server side from waiting and wasting resources.

3) TCP Four waves

First wave: The client sends a fin to turn off the client to server data transfer, and the client enters the fin_wait_1 state.
Second wave: After receiving fin, the server sends an ACK to the client, confirming that the sequence number is received sequence number +1 (same as SYN, one fin occupies a serial number), and the server enters the close_wait state.
Third wave: The server sends a fin to shut down the server-to-client data transfer, and the server enters the Last_ack state.
The fourth wave: After the client receives fin, the client enters the TIME_WAIT state, then sends an ACK to the server, confirming the serial number to receive the serial number +1,server Enter the closed state, four times to complete the wave.

4) Why the connection is three times the handshake, but the connection is four times the wave?

This is because the server is in the listen state, after receiving the SYN message to establish the connection request, the ACK and SYN are placed in a message sent to the client. And when the connection is closed, when the other side of the fin message, only to indicate that the other party no longer send the data but also can receive data, you may not all the data are sent to each other, so you can immediately close, you can send some data to each other, then send fin message to the other side to express the consent to now close the connection, Therefore, your own ACK and fin are generally divided into the development of send.

4. Permanent redirect response of the server

The server responds with a 301 permanent redirect response to the browser so that the browser accesses "http://www.google.com/" rather than "http://google.com/".

Why does the server have to redirect instead of sending the Web content that the user wants to see? One of the reasons is related to search engine rankings. If a page has two addresses, like http://www.yy.com/and http://yy.com/, the search engine will consider them to be two sites, resulting in fewer search links reducing the rankings. and search engine know 301 permanent redirect is what meaning, so will visit with www and without WWW address to the same site ranking. There are different addresses that can cause cache friendliness to become worse, and when a page has several names, it may appear several times in the cache.

Expand your knowledge1) The difference between 301 and 302.

Both the 301 and 302 status codes represent redirects, which means that the browser automatically jumps to a new URL address when it gets the status code returned by the server, which can be obtained from the location header of the response (the effect that the user sees is that the address a that he entered becomes another address B)-- This is what they have in common.

The difference between them is. 301 indicates that the resource of the old address A has been permanently removed (the resource is inaccessible), the search engine crawls the new content, but also exchanges the old URL to redirect the URL ;

302 indicates that the resource for the old address A is still (still accessible), and this redirect only temporarily jumps from the old address A to address B, and the search engine crawls the new content and saves the old URL. SEO302 Better than 301

2) REDIRECT reason:

(1) Site adjustment (such as changing the structure of the web directory), (2) the page is moved to a new address, (3) The extension of the Web page (if the application needs to change. php to. html or. shtml). In this case, if you do not redirect, then the user favorites or search engine database in the old address can only give access to the customer to get a 404 page error message, access to traffic lost in vain; In addition, some sites that register multiple domain names also need to redirect users who access those domains to the primary site, and so on.        3) When do 301 or 302 jumps? When a website or Web page temporarily moved to a new location within 24-48 hours, this time to 302 jump, and the use of 301 jump to the scene is the previous site for some reason need to remove, and then to the new address access, is permanent. Clear: Use 301 jump to the approximate scenario is as follows: 1, the domain name expires do not want to renew (or found a more suitable domain name), want to change domain name. 2, in search engine results appear without the domain name, and with the WWW domain name but not included, this time can use 301 redirect to tell the search engine our target domain name is. 3, the space server is unstable, when changing space. 5. Browser Tracking REDIRECT Address

Now the browser knows that "http://www.google.com/" is the correct address to access, so it sends another HTTP request. There's nothing to say here.

6. Server processing Request

After a lot of previous steps, we finally sent our HTTP request to the server here, in fact, the previous redirect has arrived at the server, then, the server is how to handle our request?

The backend begins with receiving a TCP message on a fixed port, processes the TCP connection, parses the HTTP protocol, and encapsulates the HTTP request object in the message format for use by the upper layer.

Some of the larger sites will send your requests to the reverse proxy server, because when the site visits are very large and the site is getting slower, a server is not enough. The same application is then deployed on multiple servers, assigning a large number of user requests to multiple machines for processing. At this point, the client is not directly through the HTTP protocol to access a Web site application server, but the first request to Nginx,nginx and then request the application server, and then return the results to the client, here nginx role is reverse proxy server. It also brings a benefit, in case one server hangs up, as long as there are other servers running properly, it will not affect the user's use.

Through Nginx's reverse proxy, we reached the Web server, the service-side script to process our requests, access to our database, get the content to get, and of course, this process involves a lot of complex operations of back-end scripts. Because this piece is not familiar, so this piece can only introduce so much.

Extended reading:1) What is a reverse proxy?

The client can directly access a website Application server through the HTTP protocol, the webmaster can add an nginx in the middle, the client requests Nginx,nginx request the application server, and then return the result to the client, when Nginx is the reverse proxy server.

7. The server returns an HTTP response

After the previous 6 steps, the server received our request, also processing our request, to this step, it will return its processing results, that is, return a HTPP response.

The HTTP response is similar to an HTTP request, and theHTTP response is made up of 3 parts, namely:

L Status Line

L Response Header (Response header)

L Response Body

http/1.1 Okdate:sat, Dec 2005 23:59:59 gmtcontent-type:text/html;charset=iso-8859-1content-length: 122    http  !--body goes here-->

Status line:

The status line consists of the protocol version , the status code in the number form, and the corresponding status description , separated by a space between the elements.

Format : http-version status-code reason-phrase CRLF

Example : http/1.1 OK \ r \ n

-- protocol version : http1.0 or other version

-- Status Description : The status description gives a short textual description of the status code. For example, when the status code is 200, the description is OK

-- status code : The status code consists of three digits, the first number defines the category of the response, and there are five possible values. As follows

1xx: Informational status code indicating that the server has received a client request and that the client can continue to send the request.


101 Switching protocols

2xx: Success status code, indicating that the server has successfully received and processed the request.

A $ OK indicates a successful client request

204 No Content succeeds, but does not return the body part of any entity

206 Partial Content successfully performed a scope (range) request

3xx: Redirect status code, indicating that the server requires client redirection.

301 Moved Permanently Permanent redirect, the location header of the response message should have a new URL for the resource

302 Found Temporary redirection, the location header of the response message gives the URL used to temporarily locate the resource

303 see another URI exists for the requested resource, and the client should use the Get method to target the requested resource

304 Not Modified server content is not updated, you can read the browser cache directly

307 temporary Redirect temporary redirection. The same as 302 found meaning. 302 prohibit post transformation to get, but not necessarily when used, 307 more browsers may follow this standard, but also rely on browser implementation

4xx: Client Error status code, indicating that the client's request has illegal content.

The bad request indicates a syntax error for client requests and cannot be understood by the server

401 unauthonzed indicates that the request was not authorized and that the status code must be used with the Www-authenticate header domain

403 Forbidden indicates that the server receives the request, but refuses to provide the service, and usually gives the reason why the service is not provided in the response body

404 Not Found The requested resource does not exist, for example, the wrong URL was entered

5xx: Server error status code, indicating an unexpected error occurred when the server failed to process the client's request properly.

Internel Server error indicates that the server has unexpected errors that could cause the client's request to be completed

503 Service unavailable indicates that the server is currently not able to process client requests, and after a period of time the server may return to normal

Response header:

Response header: Consists of keyword/value pairs, one pair per line, keywords and values separated by the English colon ":", typical response headers are:

Response body

Contains some specific information we need, such as cookie,html,image, back-end request data, and so on. It is important to note that there is a line of space between the response body and the response header, which indicates that the message of the response header is in the space, the request body that Fiddler caught, and the response body in the red box:

8. Browser Display HTML

When the browser does not fully accept all the HTML document, it has already started to display this page, how the browser renders the page on the screen? Different browsers may parse the process is not quite the same, here we only introduce the WebKit rendering process, corresponding to the WebKit rendering process, the process includes:

Parse HTML to build a DOM tree, build a render tree, layout render tree, draw a render tree

When the browser parses the HTML file, it loads "top-down" and parses the render during the loading process. During parsing, the request process is asynchronous and does not affect the loading of HTML documents when external resources are encountered, such as slices, Iconfont, CSS, and so on.

During parsing, the browser first parses the HTML file to build the DOM tree, then parses the CSS file to build the render tree, and when the render tree is built, the browser begins to lay out the render tree and draw it onto the screen. The process is more complex and involves two concepts: reflow (reflow) and Repain (redrawing).

Each element in the DOM node is in the form of a box model, which requires the browser to calculate its position and size, which is called Relow, and when the box model's position, size, and other properties, such as color, font, etc. are determined, the browser begins to draw the content, a process called Repain.

Pages are bound to experience reflow and Repain when they are first loaded. The reflow and Repain processes are very performance-intensive, especially on mobile devices, which can disrupt the user experience and sometimes cause pages to lag. So we should reduce reflow and repain as little as possible.


When the document loading process encountered a JS file, the HTML document will suspend rendering (load resolution rendering synchronization) of the thread, not only to wait for the document JS file loading complete, but also wait for the resolution to complete, before you can restore the HTML document rendering thread. Because JS has the possibility of modifying the DOM, the most classic document.write, which means that after the completion of JS execution, all subsequent downloads of resources may not be necessary, which is the root cause of JS blocking the subsequent download of resources. So in my normal code, JS is placed at the end of the HTML document.

JS parsing is done by the JS parsing engine in the browser, such as Google's V8. JS is a single-threaded run, that is, only one thing can be done in the same time, all the tasks need to queue, the previous task is over, the latter task can begin. However, there are some tasks that are time consuming, such as IO Read and write, so a mechanism is required to perform the tasks that follow, that is: synchronous tasks (synchronous) and asynchronous tasks (asynchronous).

The execution mechanism of JS can be seen as a main thread plus a task queue. A synchronization task is a task that is placed on the main thread, and an asynchronous task is a task that is placed in the task queue. All synchronization tasks are performed on the main thread to form an execution stack, and an asynchronous task has a running result to place an event in the task queue, and the script runs the execution stack sequentially, then extracts the events from the task queue and runs the tasks in the task queue, which is repeated, so called the event loop ( Event loop). The specific process can read me this article: click here

9, the browser sends requests to get embedded in the HTML resources (slices, audio, video, CSS, JS, etc.)

In fact, this step can be side-by-side in step 8, when the browser displays HTML, it will notice the need to get additional address content tags. At this point, the browser sends a FETCH request to retrieve the files. For example, I want to get external images, CSS,JS files, etc., similar to the following link:

Image: Http://static.ak.fbcdn.net/rsrc.php/z12E0/hash/8q2anwu7.gif

CSS style sheet: http://static.ak.fbcdn.net/rsrc.php/z448Z/hash/2plh8s4n.css

JavaScript Files: http://static.ak.fbcdn.net/rsrc.php/zEMOA/hash/c8yzb6ub.js

These addresses are going through a process similar to HTML reading. So the browser will find these domain names in DNS, send requests, redirect, etc...

Unlike dynamic pages, static files allow the browser to cache them. Some files may not need to be communicated to the server, read directly from the cache, or can be placed in a CDN

-------------------------------------------------Split Line-----------------------------------------------------

At this point, from the input URL to the page display process finally finished. This article has been collated for almost one weeks, of course, there are many articles on the Internet in the order may not be the same as this article, is also possible.

Now has left a year of big Yy, entered another company, there are many things in the back to learn, a little pressure at the same time there is a very strong excitement, haha. May you find a satisfactory job in gold three silver four, dried father.

Of course, the writing is limited, the wrong place, welcome to point out that this article references a lot of articles, but many of the links do not remember the article, so only listed the following three reference links.

Reference documents:




"Original" cliché-from the input URL to the page to show exactly what happened

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.