What is the process of entering the URL to the page display?

Source: Internet
Author: User
Tags ack error status code hash http request domain name server browser cache domain server

From the user input a Web site to the final display to the user, the middle of the approximate process summarized as follows:

1) Enter the URL of the URL in the client browser.

2) Send to DNS (domain name server) to obtain the IP address of the Web server corresponding to the domain name.

3) The client browser establishes a TCP (Transmission Control Protocol) connection to the Web server.

4) The client browser sends the appropriate HTTP or HTTPS request to the Web server that corresponds to the IP address.

5) The Web server responds to the request, returns the specified URL data or error message, and redirects to the new URL address if redirection is set.

6) The client browser downloads the data, parses the HTML source file, resolves the page layout during parsing, and after parsing completes, displays the underlying page in the browser.

7) Analysis of hyperlinks in the page, displayed in the current page, repeat the above process until no hyperlinks need to send, complete the page display.

1. Enter the address

When we start to enter the URL in the browser, the browser is actually in the smart match may have URL, he will be from the history, bookmarks and other places, find the string that has been entered may correspond to the URL, and then give a smart hint, so that you can complete the URL address. For Google Chrome's browser, he will even show the page directly from the cache, which means that you haven't pressed enter yet and the page is out.

2, the browser to find the IP address of the domain name  

1) Once the request is initiated, the first thing the browser should do is to resolve the domain name, in general, the browser will first look at the local hard disk hosts file to see if there is no and this domain name corresponding rules, if there is a direct use of the Hosts file inside the IP address.

2) If the local hosts file is not able to locate the corresponding IP address, the browser will issue a DNS request to the local DNS server. The local DNS server is usually provided by your network access server, such as China Telecom, Chinese mobile.

3) After the DNS request of the URL you entered arrives at the local DNS server, the local DNS server first queries its cache record, and if it has this record in the cache, it can return the result directly, which is a recursive way of querying. If not, the local DNS server also queries the DNS root server.

4) The root DNS server does not record the specific domain name and IP address of the corresponding relationship, but tells The local DNS server, you can go to the domain server to continue the query, and give the address of the domain server. This process is the iterative process.

5) The local DNS server continues to make a request to the domain server, in which case the requested object is a. com domain server: After the COM domain server receives the request, it does not directly return the correspondence between the domain name and the IP address, but instead tells The local DNS server The address of the resolution server for your domain name.

6) Finally, the local DNS server to the domain name of the resolution server to make a request, you can receive a domain name and IP address correspondence, the local DNS server not only to return the IP address to the user's computer, but also to save the corresponding relationship in the cache, in case the next time another user query, you can directly return the results, Speed up network access.

Knowledge Expansion:

1) What is DNS.

DNS (domain Name System), a distributed database of domain names and IP addresses that are mapped to each other on the Internet, makes it easier for users to access the Internet without remembering the number of IP strings that can be read directly by the machine. The process of obtaining the IP address of the host name through the hostname is called Domain name resolution (or hostname resolution).

In layman's words, we are more accustomed to remembering the name of a website, such as www.baidu.com, rather than remembering its IP address, such as: 167.23.10.2. And the computer is better at remembering the IP address of the website than the link such as www.baidu.com. Because, DNS is equivalent to a phone book, such as you want to find www.baidu.com this domain name, then I turn over my phone book, I know, oh, its telephone (IP) is 167.23.10.2.

2) Two ways to query DNS: recursive queries and iterative queries

1. Recursive parsing

When the local DNS server itself cannot answer the client's DNS queries, it needs to query the other DNS servers. There are two ways to do this, as shown in the figure is a recursive approach. The local DNS server is itself responsible for querying to other DNS servers, usually the root domain server of the domain name, and then the root domain server level down query. The resulting query results are returned to the local DNS server, which is then returned to the client by the local DNS server.

2. Iterative Analysis

When the local DNS server itself cannot answer the client's DNS queries, it can also be parsed by iterating over the query, as shown in the figure. Instead of querying the other DNS servers themselves, the local DNS server returns the IP addresses of other DNS servers that can resolve the domain name to the client DNS program, and the client DNS program continues to query these DNS servers until the query results are obtained. In other words, iterative parsing only helps you to find the relevant server, and will not help you to check. For example: baidu.com server IP address in 192.168.4.5 here, you check it yourself, I am relatively busy, can only help you here.

3. The browser sends an HTTP request to the Web server

After receiving the IP address of the domain name, the browser initiates a TCP connection request with a random port (1024< port <65535) to the server's Web program (common Httpd,nginx, etc.) 80 port. This connection request arrives at the server side (this intermediate through various routing devices, except within the LAN), into the network card, and then into the kernel of the TCP/IP protocol stack (used to identify the connection request, unpack the packet, a layer of peel off), It is also possible to pass the filtering of the NetFilter firewall (which is the kernel module) and eventually reach the Web program, eventually establishing a TCP/IP connection.

After a TCP connection has been established, an HTTP request is initiated. A typical HTTP request header needs to include the method of the request, such as Get or post, not commonly used PUT and DELETE, HEAD, option, and TRACE methods, the general browser can only initiate GET or post Request.

When the client initiates an HTTP request to the server, there is a request for information that contains three parts:

| Request method URI Protocol/version

| Requests header (Request header)

| Request Body:

Here is an example of a complete HTTP request:

get/sample.jsp

http/1.1 accept:image/gif.image/jpeg,*/*

Accept-language:zh-cn

Connection:keep-alive

Host:localhost

user-agent:mozila/4.0 (compatible; MSIE5.01; Window NT5.0)

Accept-encoding:gzip,deflate

username=jinqiao&password=1234

Note: The last request header is followed by a blank line that sends a carriage return and a newline character, notifying the server that the following no longer has a request header.

(1) The first line of the request is "method URL negotiation/version": Get/sample.jsp http/1.1

(2) Requesting header (Request header)

The request header contains many useful information about the client environment and the request body. For example, the request header can declare the language used by the browser, the length of the request body, and so on.

accept:image/gif.image/jpeg.*/*

Accept-language:zh-cn

Connection:keep-alive

Host:localhost

user-agent:mozila/4.0 (Compatible:msie5.01:windows NT5.0)

Accept-encoding:gzip,deflate.

(3) Request body

Between the request header and the request body is a blank line, which is very important, which indicates that the request header has ended, followed by the request body. The request body can contain query string information submitted by the customer:

username=jinqiao&password=1234

Knowledge Expansion:

1) TCP Three-time handshake

First handshake: Client A will set the flag bit SYN to 1, randomly generate a value of seq=j (J of the value range of =1234567) packet to the server, client a into the syn_sent state, waiting for service side B to confirm;

Second handshake: Server B receives the packet by the flag bit syn=1 know client a request to establish a connection, service side B will flag bit SYN and ACK are set to 1,ack=j+1, randomly generate a value seq=k, and the data packets to client A to confirm the connection request, Server B enters Syn_ RCVD status.

Third handshake: After client A receives the acknowledgment, checks whether the ACK is j+1,ack 1, if correct, sets the flag bit ACK to 1,ack=k+1, and sends the data packets to Server B, and the service side B checks if the ACK is K+1,ack 1, and if correct, the connection is established successfully. Client A and Server B enter the established state, completing the three handshake, and then the data transfer between client A and server B can begin.

2) TCP Four waves

First wave: The client sends a fin to turn off the client to server data transfer, and the client enters the fin_wait_1 state.

Second wave: After receiving fin, the server sends an ACK to the client, confirming that the sequence number is received sequence number +1 (same as SYN, one fin occupies a serial number), and the server enters the close_wait state.

Third wave: The server sends a fin to shut down the server-to-client data transfer, and the server enters the Last_ack state.

The fourth wave: After the client receives fin, the client enters the TIME_WAIT state, then sends an ACK to the server, confirming the serial number to receive the serial number +1,server Enter the closed state, four times to complete the wave.

3) Why the connection is a three-time handshake, while closing the connection is four times the wave.

 

This is because the server is in the listen state, after receiving the SYN message to establish the connection request, the ACK and SYN are placed in a message sent to the client. And when the connection is closed, when the other side of the fin message, only to indicate that the other party no longer send the data but also can receive data, you may not all the data are sent to each other, so you can immediately close, you can send some data to each other, then send fin message to the other side to express the consent to now close the connection, Therefore, your own ACK and fin are generally divided into the development of send.

4. Permanent redirect response of the server

The server responds with a 301 permanent redirect response to the browser so that the browser accesses "http://www.google.com/" rather than "http://google.com/".

Why does the server have to redirect rather than send the page content that the user wants to see directly? One of the reasons is related to search engine rankings. If a page has two addresses, like http://www.yy.com/and http://yy.com/, the search engine will consider them to be two sites, resulting in fewer search links reducing the rankings. and search engine know 301 permanent redirect is what meaning, so will visit with www and without WWW address to the same site ranking. There are different addresses that can cause cache friendliness to become worse, and when a page has several names, it may appear several times in the cache.

Expand your knowledge

1) The difference between 301 and 302.

Both the 301 and 302 status codes represent redirects, which means that the browser automatically jumps to a new URL address when it gets the status code returned by the server, which can be obtained from the location header of the response (the effect that the user sees is that the address a that he entered becomes another address B)-- This is what they have in common.

301 indicates that the resource of the old address A has been permanently removed (the resource is inaccessible), the search engine crawls the new content, but also exchanges the old URL to redirect the URL;

302 indicates that the resource for the old address A is still (still accessible), and this redirect only temporarily jumps from the old address A to address B, and the search engine crawls the new content and saves the old URL. SEO302 Better than 301

2) redirect reason:

(1) Website adjustment (such as change of web directory structure);

(2) The webpage is moved to a new address;

(3) Page extension changes (if the application needs to change. php to. html or. shtml).

In this case, if you do not redirect, then the user favorites or search engine database in the old address can only give access to the customer to get a 404 page error message, access to traffic lost in vain; In addition, some sites that register multiple domain names also need to redirect users who access those domains to the primary site, and so on.

3) When do 301 or 302 jumps?

When a website or Web page temporarily moved to a new location within 24-48 hours, this time to 302 jump, and the use of 301 jump to the scene is the previous site for some reason need to remove, and then to the new address access, is permanent.

Clearly: The approximate scenario for using 301 jumps is as follows:

1, the domain name does not want to renew the fee (or found a more suitable domain name), want to change the domain name.

2, in search engine results appear without the domain name, and with the WWW domain name but not included, this time can use 301 redirect to tell the search engine our target domain name is.

3, the space server is unstable, when changing space.

5. Browser Tracking REDIRECT Address

Now the browser knows that "http://www.google.com/" is the correct address to access, so it sends another HTTP request. There's nothing to say here.

6. Server processing Request

After a lot of previous steps, we finally sent our HTTP request to the server here, in fact, the previous redirect has arrived at the server, then, the server is how to handle our request it?

The backend begins with receiving a TCP message on a fixed port, processes the TCP connection, parses the HTTP protocol, and encapsulates the HTTP request object in the message format for use by the upper layer.

Some of the larger sites will send your requests to the reverse proxy server, because when the site visits are very large and the site is getting slower, a server is not enough. The same application is then deployed on multiple servers, assigning a large number of user requests to multiple machines for processing. At this point, the client is not directly through the HTTP protocol to access a Web site application server, but the first request to Nginx,nginx and then request the application server, and then return the results to the client, here nginx role is reverse proxy server. It also brings a benefit, in case one server hangs up, as long as there are other servers running properly, it will not affect the user's use.

Through Nginx's reverse proxy, we reached the Web server, the service-side script to process our requests, access to our database, get the content to get, and of course, this process involves a lot of complex operations of back-end scripts. Because this piece is not familiar, so this piece can only introduce so much.

Extended reading:

1) What is a reverse proxy.

The client can directly access a website Application server through the HTTP protocol, the webmaster can add an nginx in the middle, the client requests Nginx,nginx request the application server, and then return the result to the client, when Nginx is the reverse proxy server.

7. The server returns an HTTP response

After the previous 6 steps, the server received our request, also processing our request, to this step, it will return its processing results, that is, return a HTPP response.

The HTTP response is similar to an HTTP request, and the HTTP response is made up of 3 parts, namely:

L Status Line

L Response Header (Response header)

L Response Body

http/1.1 OK

Date:sat, Dec 2005 23:59:59 GMT

Content-type:text/html;charset=iso-8859-1 content-length:122

http

!--body goes here-->

The status line consists of the Protocol version, the status code in the number form, and the corresponding status description, separated by a space between the elements.

Format: http-version status-code reason-phrase CRLF

Example: http/1.1 OK \ r \ n

--Protocol version: http1.0 or other version

--Status Description: The status description gives a short textual description of the status code. For example, when the status code is 200, the description is OK

--Status code: The status code consists of three digits, the first number defines the category of the response, and there are five possible values. As follows

1XX: Informational status code indicating that the server has received a client request and that the client can continue to send the request.

Continue

101 Switching protocols

2XX: Success status code, indicating that the server has successfully received and processed the request. A $ OK indicates a successful client request

204 No Content succeeds, but does not return the body part of any entity

206 Partial Content successfully performed a scope (range) request

3xx: Redirect status code, indicating that the server requires client redirection.

301 Moved Permanently Permanent redirect, the location header of the response message should have a new URL for the resource

302 Found Temporary redirection, the location header of the response message gives the URL used to temporarily locate the resource

303 see another URI exists for the requested resource, and the client should use the Get method to target the requested resource

304 Not Modified server content is not updated, you can read the browser cache directly

307 temporary Redirect temporary redirection. The same as 302 found meaning. 302 prohibit post transformation to get, but not necessarily when used, 307 more browsers may follow this standard, but also rely on browser implementation

4XX: Client Error status code, indicating that the client's request has illegal content.

The bad request indicates a syntax error for client requests and cannot be understood by the server

401 unauthonzed indicates that the request was not authorized and that the status code must be used with the Www-authenticate header domain

403 Forbidden indicates that the server receives the request, but refuses to provide the service, and usually gives the reason why the service is not provided in the response body

404 Not Found The requested resource does not exist, for example, the wrong URL was entered

5XX: Server error status code, indicating an unexpected error occurred when the server failed to process the client's request properly.

Internel Server error indicates that the server has unexpected errors that could cause the client's request to be completed

503 Service unavailable indicates that the server is currently not able to process client requests, and after a period of time the server may return to normal

8. Browser Display HTML

When the browser does not fully accept the entire HTML document, it has started to display this page, how the browser renders the page on the screen. Different browsers may parse the process is not quite the same, here we only introduce the WebKit rendering process, the following figure corresponds to the WebKit rendering process, the process includes:

Parse HTML to build a DOM tree, build a render tree, layout render tree, draw a render tree

When the browser parses the HTML file, it loads "top-down" and parses the render during the loading process. During parsing, if an external resource is encountered, such as a picture, a chain of CSS, Iconfont, and so on, the request process is asynchronous and does not affect the loading of the HTML document.

During parsing, the browser first parses the HTML file to build the DOM tree, then parses the CSS file to build the render tree, and when the render tree is built, the browser begins to lay out the render tree and draw it onto the screen. The process is more complex and involves two concepts: reflow (reflow) and Repain (redrawing).

Each element in the DOM node is in the form of a box model, which requires the browser to calculate its position and size, which is called Relow, and when the box model's position, size, and other properties, such as color, font, etc. are determined, the browser begins to draw the content, a process called Repain.

Pages are bound to experience reflow and Repain when they are first loaded. The reflow and Repain processes are very performance-intensive, especially on mobile devices, which can disrupt the user experience and sometimes cause pages to lag. So we should reduce reflow and repain as little as possible.

When the document loading process encountered a JS file, the HTML document will suspend rendering (load resolution rendering synchronization) of the thread, not only to wait for the document JS file loading complete, but also wait for the resolution to complete, before you can restore the HTML document rendering thread. Because JS has the possibility of modifying the DOM, the most classic document.write, which means that after the completion of JS execution, all subsequent downloads of resources may not be necessary, which is the root cause of JS blocking the subsequent download of resources. So in my normal code, JS is placed at the end of the HTML document.

JS parsing is done by the JS parsing engine in the browser, such as Google's V8. JS is a single-threaded run, that is, only one thing can be done in the same time, all the tasks need to queue, the previous task is over, the latter task can begin. However, there are some tasks that are time consuming, such as IO Read and write, so a mechanism is required to perform the tasks that follow, that is: synchronous tasks (synchronous) and asynchronous tasks (asynchronous).

The execution mechanism of JS can be seen as a main thread plus a task queue. A synchronization task is a task that is placed on the main thread, and an asynchronous task is a task that is placed in the task queue. All synchronization tasks are performed on the main thread to form an execution stack, and an asynchronous task has a running result to place an event in the task queue, and the script runs the execution stack sequentially, then extracts the events from the task queue and runs the tasks in the task queue, which is repeated, so called the event loop ( Event loop).

9, the browser sends requests to get embedded in the HTML resources (such as pictures, audio, video, CSS, JS, etc.)

In fact, this step can be side-by-side in step 8, when the browser displays HTML, it will notice the need to get additional address content tags. At this point, the browser sends a FETCH request to retrieve the files. For example, I want to get external images, CSS,JS files, etc., similar to the following link:

Image: Http://static.ak.fbcdn.net/rsrc.php/z12E0/hash/8q2anwu7.gif

CSS style sheet: http://static.ak.fbcdn.net/rsrc.php/z448Z/hash/2plh8s4n.css

JavaScript Files: http://static.ak.fbcdn.net/rsrc.php/zEMOA/hash/c8yzb6ub.js

These addresses are going through a process similar to HTML reading. So the browser will find these domain names in DNS, send requests, redirect, etc...

Unlike dynamic pages, static files allow the browser to cache them. Some files may not need to be communicated to the server, read directly from the cache, or can be placed in a CDN


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.