[Original] The old saying goes from entering a url to displaying what happened on the page.

Source: Internet
Author: User
Tags error status code domain server website ip

[Original] The old saying goes from entering a url to displaying what happened on the page.

At the beginning, this article was quite tangled, because you can search for a lot of information from the url input to the page display. In addition, the interview is basically a mandatory question. During the interview in April, although you know what happened in this process, when the interviewer asks the question step by step, many details are unclear.

Recently, I have been looking at http-related things, so I want to give an in-depth summary of this topic,The purpose of this article is to summarize and expand the knowledge by entering what happens after the url. So the article may be complicated.

The general process is as follows:

1. Enter the address

When we start to enter the url in the browser, the browser is actually already intelligently matching the url, which will be recorded in history, bookmarks, and other places, find the url that may correspond to the entered string, and then provide a smart prompt so that you can complete the url. Google chrome even displays the webpage directly from the cache. That is to say, the page is displayed without pressing enter.

2. Search for the IP address of the domain name in the browser

1. Once a request is initiated, the browser first needs to resolve the domain name. Generally, the browser will first check the hosts file on the local hard disk to see if there are any rules corresponding to the domain name, if yes, use the ip address in the hosts file.

2. If the corresponding ip address cannot be found in the local hosts file, the browser will send a DNS request to the local DNS server. Local DNS servers are generally provided by your network access server provider, such as China Telecom and China Mobile.

3. After the DNS request for the URL you entered arrives at the local DNS server, the local DNS server first queries its cache record. If this record exists in the cache, the result can be directly returned, this process is recursive. If not, the local DNS server must query the DNS root server.

4. The root DNS server does not record the corresponding relationship between the specific domain name and IP address, but tells the local DNS server that you can go to the Domain Server to continue querying and give the Domain Server address. This process is an iterative process.

5. The local DNS server continues to send a request to the domain server. In this example, the request object is a. com domain server .. After receiving the request, the com domain server will not directly return the ing between the domain name and the IP address, but will tell the local DNS server the address of your domain name resolution server.

6. At last, the local DNS server sends a request to the DNS server of the domain name. Then, the local DNS server will receive a ing between the domain name and the IP address. The local DNS server will not only return the IP address to the user's computer, you also need to save the corresponding relationship in the cache for the next time another user queries, you can directly return results to speed up network access.

 

The following figure perfectly explains the process:


Knowledge extension:

1) What is DNS?

  DNS (Domain Name System) is a distributed database that maps Domain names and IP addresses on the Internet, allowing users to access the Internet more conveniently, instead of remembering the number of IP address strings that can be directly read by machines. The process of obtaining the IP address corresponding to the host name through the host name is called domain name resolution (or host name resolution ).

In general, we are more accustomed to remembering the name of a website, such as www.baidu.com, rather than remembering its IP address, such as 167.23.10.2. Computers are better at remembering website IP addresses, rather than links like www.baidu.com. Because DNS is equivalent to a phone book. For example, if you are looking for the domain name www.baidu.com, I will flip through my phone book and I will know that its phone number (ip) is 167.23.10.2.

 

2) Two DNS query methods: recursive query and iterative Query

1. Recursive Parsing

When the local DNS server itself cannot answer the client's DNS query, it needs to query other DNS servers. There are two methods, as shown in recursive mode. The local DNS server itself is responsible for querying other DNS servers. Generally, it first queries the root domain server of the domain name, and then the root domain server performs a level-1 downward query. The final query result is returned to the local DNS server and then to the client.


2. iterative analysis

When the local DNS server itself cannot answer the client's DNS query, you can also perform iterative query for resolution, as shown in. The local DNS server does not query other DNS servers, but returns the IP addresses of other DNS servers that can resolve the domain name to the client DNS program, the client DNS program continues to query these DNS servers until the query results are obtained. That is to say, iterative parsing only helps you find the relevant server, rather than you. For example, if the IP address of the baidu.com server is 192.168.4.5, check it by yourself. I am very busy and can only help you here.


 

3) How DNS domain name space is organized

We have mentioned above that the root DNS server and domain DNS server are all the domain name spaces of DNS. For details about the five categories used to describe the DNS domain name in the functional namespace, see the following table and examples with each name type.

(Image stealing)

 

4) DNS load balancing

When a website has enough users, if the requested resources are all on the same machine, this machine may crash at any time. The solution is to use DNS load balancing technology. The principle is to configure multiple IP addresses for the same host name on the DNS server, the DNS server returns different resolution results for each query based on the IP address recorded by the host in the DNS file, and directs the client access to different machines, allows different clients to access different servers to achieve load balancing. For example, the distance between the server and the user can be determined based on the load volume of each machine.

 

3. the browser sends an HTTP request to the web server

After obtaining the IP address corresponding to the domain name, the browser will send a random port (1024 <port <65535) to the server's WEB program (commonly used include httpd, nginx, etc) port 80 initiates TCP connection requests。After the connection request arrives at the server side (except in the LAN), it enters the NIC, the next step is to enter the kernel's TCP/IP protocol stack (used to identify the connection request, unseal the packet, and strip the packet layer by layer). It may also go through the Netfilter firewall (which belongs to the kernel module) finally reach the WEB program, and finally establish a TCP/IP connection.

TCP connection:


After a TCP connection is established, an http request is initiated. A typical http request header generally needs to include a request method, such as GET or POST. The PUT, DELETE, HEAD, OPTION, and TRACE methods are not commonly used, generally, browsers can only initiate GET or POST requests.

When the client initiates an http request to the server, some request information is contained in three parts:

| Request Method URI protocol/version

| Request Header)

| Request body:

The following is a complete HTTP request example:

GET/sample.jspHTTP/1.1
Accept:image/gif.image/jpeg,*/*
Accept-Language:zh-cn
Connection:Keep-Alive
Host:localhost
User-Agent:Mozila/4.0(compatible;MSIE5.01;Window NT5.0)
Accept-Encoding:gzip,deflate

username=jinqiao&password=1234

 Note:The last request header is followed by an empty line that sends a carriage return and line feed, notifying the server that there are no request headers below.

(1) The first line of the request is "Method URL discussion/version": GET/sample. jsp HTTP/1.1
(2) Request Header)
The request header contains many useful information about the client environment and request body. For example, the request header can declare the language used by the browser and the length of the Request body.

Accept:image/gif.image/jpeg.*/*
Accept-Language:zh-cn
Connection:Keep-Alive
Host:localhost
User-Agent:Mozila/4.0(compatible:MSIE5.01:Windows NT5.0)
Accept-Encoding:gzip,deflate.

(3) Request body
The request header is an empty line between the request header and the request body. This line is very important. It indicates that the request header has ended, followed by the request body. The request body can contain the query string information submitted by the customer:

username=jinqiao&password=1234
Knowledge expansion: 1) TCP three-way handshake

The first handshake: Client A sets the flag SYN to 1 and randomly generates A data packet with the value seq = J (the value range of J is = 1234567) to the server, client A enters the SYN_SENT status and waits for confirmation from server B;

The second handshake: After server B receives the data packet, the flag SYN = 1 knows that client A requests establish A connection. Server B sets both the flag SYN and ACK to 1, ack = J + 1, A value seq = K is randomly generated, and the packet is sent to Client A to confirm the connection request. Server B enters the SYN_RCVD state.

The third handshake: After receiving the confirmation, client A checks whether ack is J + 1 and ACK is 1. If yes, set the flag ACK to 1, ack = K + 1, the packet is sent to server B. Server B checks whether ack is K + 1 and ACK is 1. If yes, the connection is ESTABLISHED successfully, and client A and server B enter the ESTABLISHED status, after three handshakes, data can be transmitted between Client A and server B.

:


 

  2) Why do I need three handshakes?

 In the fourth edition of computer network, the purpose of "three-way handshake" is to "Prevent the failed connection request message segment from being suddenly transmitted to the server, thus generating an error"

The example in this book is that the generation of "invalid connection request message segments" is not lost in the case that the first connection request message segment sent by the client is not lost, however, a network node is stuck for a long time, so that it will arrive at the server at a certain time after the connection is released. This is a long-overdue packet segment. However, after the server receives the invalid Connection Request Message segment, it is mistaken for a new connection request sent by the client again. Therefore, the client sends a confirmation message segment and agrees to establish a connection.

If the "three-way handshake" is not used, a new connection is established as long as the server sends a confirmation message. Because the client does not send a connection request, it does not accept the confirmation from the server or send data to the server. However, the server thinks that the new transport connection has been established and waits for the client to send data. In this way, many server resources are wasted. The "three-way handshake" method can prevent the above phenomenon. For example, in that case, the client will not send confirmation to the server. Because the server cannot receive the confirmation, it will know that the client does not require a connection .". The main purpose is to prevent the server from waiting and wasting resources.

3) Four TCP Waves

The first wave: the Client sends a FIN to disable data transmission from the Client to the Server. The Client enters the FIN_WAIT_1 state.
The second wave: After the Server receives the FIN, it sends an ACK to the Client, confirming that the serial number is the received serial number + 1 (same as SYN, one FIN occupies one serial number), and the Server enters the CLOSE_WAIT status.
The third wave: the Server sends a FIN to disable data transmission from the Server to the Client. The Server enters the LAST_ACK status.
The fourth wave: after the Client receives the FIN, the Client enters the TIME_WAIT status, and then sends an ACK to the Server, confirming that the serial number is received by the serial number + 1. The Server enters the CLOSED status and completes the four waves.


4) Why is the three-way handshake while the four-way handshake?

This is because the server receives the SYN Packet Of the connection request in the LISTEN statusPut in a messageSend to the client. When the connection is closed, when receiving the FIN message from the other party, it only means that the other party no longer sends data but can still receive data, and not all of its data may be sent to the other party, therefore, you can immediately close the connection, send some data to the other party, and then send a FIN message to the other party to agree to close the connection. Therefore, both ACK and FIN are usually sent separately.

 

4. Permanent server redirect response

The server returns a 301 permanent redirect response to the browser, so that the browser will access "http://www.google.com/" instead of" http://google.com /".

Why must the server redirect instead of directly sending the webpage content that the user wants to view? One reason is related to the search engine ranking. If a page has two addresses. The search engine knows what 301 permanent redirection means, so that the addresses with and without www will be ranked under the same website. In addition, different addresses may cause poor cache friendliness. When a page has several names, it may appear several times in the cache.

Extended knowledge 1) differences between 301 and 302.

Status Codes 301 and 302 indicate redirection, that is, the browser automatically jumps to a new URL address after obtaining the status code returned by the server, this address can be obtained from the response Location header (the user sees that the address A he entered instantly changes to another address B)-this is what they have in common.

Their difference is. 301 indicates that the resource of the old address A has been permanently removed (the resource is inaccessible ),While capturing new content, the search engine also switches the old URL to a redirected URL.;

302 indicates that the resource of the old address A is still in progress (still accessible). This redirection only temporarily redirects from the old address A to address B,The search engine crawls new content and saves the old URL. SEO302 is better than 301

 

2) redirection reason:

). In this case, if no redirection is performed, the old address in the user favorites or search engine database can only give the visitor a 404 page error message, and the access traffic is lost; in addition, some websites that have registered multiple domain names also need to redirect users who access these domain names to the primary site automatically. 3) When do I perform a 301 or 302 jump? When a website or webpage is temporarily moved to a new location within 24-48 hours, 302 redirection is required. The scenario where 301 redirection is used is that the previous website needs to be removed for some reason, then access from the new address is permanent. Clearly speaking: The general scenario of 301 redirection is as follows: 1. If you do not want to renew your domain name upon expiration (or find a domain name that is more suitable for your website), you need to change the domain name. 2. Domain Names without www are not included in the search results of the search engine, at this time, we can use 301 redirection to tell the search engine which domain is our target. 3. The space server is unstable. 5. browser tracking and redirection address

Now the browser knows that "http://www.google.com/" is the correct address to access, so it will send another http request. There is nothing to say here

 

6. The server processes the request

After the previous steps, we finally sent our http request to the server. In fact, the previous redirection has arrived at the server. How does the server process our request?

The backend starts to receive TCP packets on a fixed port. It processes the TCP connection, parses the HTTP protocol, and further encapsulates it into an HTTP Request object according to the Message format, for upper-layer use.

Some larger websites will send your requests to the reverse proxy server, because when the website traffic is very large, the website is getting slower and slower, and one server is not enough. Therefore, the same application is deployed on multiple servers, and a large number of user requests are distributed to multiple machines for processing. In this case, the client does not directly access a website application server through the HTTP protocol, but first requests Nginx, Nginx then requests the application server, and then returns the result to the client, nginx serves as a reverse proxy server. At the same time, it also brings about a benefit. If one of the servers fails, as long as there are other servers running normally, the user will not be affected.

:


Through the reverse proxy of Nginx, we arrived at the web server, and the server script processed our requests, accessed our database, and obtained the required content. Of course, this process involves complex operations of many backend scripts. I am not familiar with this part, so this part can only be described so much.

 

Extended reading: 1) What is reverse proxy?

The client can directly access a website application server through the HTTP protocol. The website administrator can add an Nginx in the middle, and the client requests Nginx and Nginx to request the application server, and then return the result to the client, nginx is the reverse proxy server.


 

7. The server returns an HTTP Response

After the preceding six steps, the server receives our request and processes our request. At this step, the server returns its processing result, that is, an HTPP response.

The HTTP response is similar to the HTTP request. The HTTP response consists of three parts:

L status line

L Response Header)

L response body

HTTP/1.1 200 OK
Date: Sat, 31 Dec 2005 23:59:59 GMT
Content-Type: text/html;charset=ISO-8859-1
Content-Length: 122

<html>
<head>
<title>http</title>
</head>
<body>
<!-- body goes here -->
</body>
</html>

Status line:

Status lineProtocol version, Numeric formStatus Code, And the correspondingStatus descriptionThe elements are separated by spaces.

Format: HTTP-Version Status-Code Reason-Phrase CRLF

Example: HTTP/1.1 200 OK \ r \ n

--Protocol version: Whether to use http1.0 or another version

--Status description: The status description provides a brief text description of the status code. For example, if the status code is 200, the description is OK.

--Status Code: The Status Code consists of three digits. the first digit defines the category of the response, and there are five possible values. As follows:

 

1xx: Information Status Code, indicating that the server has received the client request, and the client can continue to send the request.

100 Continue

101 Switching Protocols

2xx: Success status code, indicating that the server has successfully received and processed the request.

200 OK indicates that the client request is successful

204 No Content succeeded, but No entity part is returned

206 Partial Content successfully executes a Range request

3xx: Redirection status code, indicating that the server requires client redirection.

301 Moved Permanently permanent redirection. The Location header of the response message should have a new URL for the resource.

302 Found: Temporary redirection. The URL provided by the Location header of the response message is used to temporarily locate the resource.

303 when the requested resource See Other has another URI, the client should use the GET method to obtain the requested resource.

304 the content of the Not Modified server is Not updated. You can directly read the browser cache.

307 Temporary Redirect Temporary redirection. Same as 302 Found. 302 prohibit POST conversion to GET, but it is not necessarily used. 307, more browsers may follow this standard, but it also depends on the specific implementation of the browser.

4xx: The client error status code, indicating that the client request has illegal content.

400 Bad Request indicates that the client Request has a syntax error and cannot be understood by the server.

401 Unauthonzed indicates that the request is unauthorized. the status code must be used with the WWW-Authenticate header domain.

403 Forbidden indicates that the server receives a request but refuses to provide the service. The reason for not providing the service is usually given in the response body.

The resource requested by 404 Not Found does Not exist. For example, an incorrect URL is entered.

5xx: Server Error status code, indicating that the server fails to process client requests normally and an unexpected error occurs.

500 Internel Server Error indicates that an unexpected Error occurs on the Server, resulting in failure to complete client requests

503 Service Unavailable indicates that the server cannot process client requests. After a period of time, the server may return to normal.

 

Response Header:

Response Header: it consists of key/value pairs. Each line has a pair. keywords and values are separated by a colon (:). Typical response headers include:


 

Response body

Contains the specific information we need, such as cookie, html, image, and request data returned by the backend. Note that there is a line of space between the response body and the Response Header, indicating that the response header information ends with a space, which is the request body captured by fiddler, in the red box:Response body:

 

8. HTML display in the browser

When the browser does not fully accept all HTML documents, it has begun to display this page. How does the Browser display the page on the screen? Different browsers may not parse the same process. Here we only introduce the rendering process of webkit, which corresponds to the process of WebKit rendering. This process includes:

Parse html to build the dom tree-> build the render tree-> layout the render tree-> draw the render tree


When parsing html files, the browser loads them from top to bottom and performs parsing rendering during the loading process. When an external resource is requested during parsing, the request process is asynchronous and does not affect the loading of html documents.

During the parsing process, the browser first parses the HTML file to build the DOM tree, then parses the CSS file to build the rendering tree. After the rendering tree is built, the browser begins to layout the rendering tree and draws it to the screen. This process is complex and involves two concepts: reflow and repain ).

Each element in the DOM node exists in the form of a box model, which requires the browser to calculate its location and size. This process is called relow. When the location of the box model, after determining the size and other attributes, such as the color and font, the browser begins to draw the content. This process is called repain.

When a page is loaded for the first time, it will inevitably experience reflow and repain. The reflow and repain processes consume a lot of performance, especially on mobile devices. They damage the user experience and sometimes cause page freezing. Therefore, we should minimize reflow and repain.


When a js file is encountered during file loading, the html file suspends the rendering (loading resolution rendering synchronization) thread, not only waiting for the js file loading in the document to complete, but also waiting for the parsing to complete, to restore the rendering thread of the html document. Because JavaScript may modify the DOM, the most classic document. write, which means that the subsequent download of all resources may be unnecessary before JS execution is complete, which is the root cause of js blocking subsequent resource downloads. Therefore, in my daily code, js is placed at the end of the html document.

JS Parsing is completed by the JS parsing engine in the browser, such as Google's V8. JS runs in a single thread. That is to say, only one task can be done within the same time period. All tasks need to be queued. The previous task ends and the other task can start. However, some tasks are time-consuming, such as I/O Reading and Writing. Therefore, a mechanism is required to first execute the following tasks: synchronization tasks (synchronous) and asynchronous ).

The execution mechanism of JS can be seen as adding a task queue to a main thread ). A synchronization task is a task executed on the main thread, and an asynchronous task is a task in the task queue. All synchronization tasks are executed on the main thread to form an execution stack. An event is placed in the task queue when an asynchronous task has a running result. The execution stack is run in sequence when the script runs, then, events are extracted from the task queue to run tasks in the task queue. This process is repeated and is called Event loop ). For detailed process, see my article: click here

 

9. the browser sends a request to obtain resources embedded in HTML (such as video, audio, video, CSS, and JS)

In fact, this step can be tied in Step 8. When the browser displays HTML, it will notice the tags that need to be obtained from other addresses. In this case, the browser sends a request to obtain these files again. For example, if I want to obtain external images, CSS, and JS files, it is similar to the following link:

Picture: http://static.ak.fbcdn.net/rsrc.php/z12E0/hash/8q2anwu7.gif

CSS style table: http://static.ak.fbcdn.net/rsrc.php/z448Z/hash/2plh8s4n.css

JavaScript file: http://static.ak.fbcdn.net/rsrc.php/zEMOA/hash/c8yzb6ub.js

These addresses all go through a process similar to HTML reading. So the browser will find these domain names in DNS, send requests, redirect, and so on...

Unlike dynamic pages, static files can be cached by browsers. Some files may not need to communicate with the server, but can be directly read from the cache or stored in the CDN

 

------------------------------------------------- Split line -----------------------------------------------------

 

So far, the process from entering the url to displaying the page has finally been completed. This article has been sorted out for almost a week before and after. Of course, the order of many articles on the Internet may not be the same as that in this article.

Now, I have been away from the big company for a year. Now I have entered another company. There are a lot of things waiting for study. I am also excited when I feel a little pressure. Haha. May you find a satisfactory job in Jin sanyin Si, kiba dad.

Of course, the article is limited and incorrect. you can point out that this article has referenced a lot of articles, but the links of many articles do not remember, so only the following three reference links are listed.

 

 

References:

Https://segmentfault.com/a/1190000006879700

Http://igoro.com/archive/what-really-happens-when-you-navigate-to-a-url/

Http://zrj.me/archives/589

 


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.