Full process of HTTP transaction

Source: Internet
Author: User
Tags ack error status code nets pack rfc



When we enter www.linux178.com in the address bar of the browser, and then go to the car, enter this moment to see the page in the end what happened.


The following procedure is only a personal understanding:


Domain name resolution--> initiate TCP's 3-time handshake--> The HTTP request is initiated after the TCP connection is established--> the server responds to the HTTP request, the browser gets the HTML code--> the browser parses the HTML code, and requests the resources in the HTML code (such as JS, CSS , pictures, and so on)--> the browser renders the page to the user


For the HTTP protocol, refer to the following:

A ramble on HTTP protocol http://kb.cnblogs.com/page/140611/

HTTP protocol Overview http://www.cnblogs.com/vamei/archive/2013/05/11/3069788.html

Understand all aspects of HTTP headers http://kb.cnblogs.com/page/55442/


The following is an analysis of the above process, we take the Chrome browser as an example:


1. Domain Name resolution


First, the Chrome browser resolves the IP address of the www.linux178.com domain name, which is exactly the name of the host. How to resolve to the corresponding IP address.


①chrome Browser will first search the browser's own DNS cache (cache time is relatively short, about 1 minutes, and can only accommodate 1000 cache), look at its own cache has www.linux178.com corresponding entries, and does not expire, If there is, and does not expire, resolves to this end.

Note: How do we look at Chrome's own cache. You can use the chrome://net-internals/#dns for viewing


② If the browser does not find a corresponding entry in its own cache, then chrome searches the operating system's own DNS cache and stops the search resolution if it finds and does not expire.

Note: How to view the DNS cache of the operating system itself, with Windows system as an example, you can use Ipconfig/displaydns at the command line to view


③ If the DNS cache is not found in the Windows system, try to read the Hosts file (located in C:\Windows\System32\drivers\etc) to see if there are any IP addresses for that domain name, and if so, parse successfully.


④ if a corresponding entry is not found in the Hosts file, the browser initiates a DNS system call to the local-configured preferred DNS server (typically provided by a telecommunications operator, You can also use a DNS server like Google to initiate a domain resolution request (a UDP protocol that initiates a request to DNS port 53, a recursive request that the operator's DNS server must provide us with the IP address of that domain), The operator's DNS server first looks for its own cache, finds the corresponding entry, and does not expire, the resolution succeeds. If the corresponding entry is not found, then there is the operator's DNS proxy for our browser to initiate an iterative DNS resolution request, it is first to find the root domain DNS IP address (this DNS server is built into 13 root domain DNS IP address), to find the root domain DNS address, Will request to it (please ask www.linux178.com this domain name IP address is how many ah.) , the root domain discovers this is a domain name of a top-level domain COM domain, therefore tells the operator DNS I do not know this domain name IP address, but I know the COM domain IP address, you go to find it, so the operator's DNS got the COM domain IP address, Also launched a request to the IP address of the COM domain (what is the IP address of www.linux178.com This domain name, please?), COM domain This server tells the operator DNS I don't know www.linux178.com this domain IP address, but I know linux178.com this domain DNS address, you go to find it, so the operator's DNS and to linux178.com the DNS address of this domain name (this is generally By the domain name registrar to provide, such as million nets, new nets, etc. to initiate the request (please www.linux178.com this domain name IP address is how much.) ), This time linux178.com domain DNS Server A check, eh, really here, so I found the results sent to the operator's DNS server, this time the operator's DNS server to get www.linux178.com this domain name corresponding to the IP address, and return to the Windows system kernel, The kernel returned the results to the browser, and finally the browser got the www.linux178.com corresponding IP address, this step of the action.


Note: In general, the following steps are not performed


If the above 4 steps have not been resolved successfully, the following steps are followed (for the Windows operating system):


The ⑤ operating system looks for the NetBIOS name cache (which is located on the client computer) and what is in the cache. The computer name and IP address of the computer that I have successfully communicated with in the recent period of time will be in this cache. Under what circumstances can this step be resolved successfully? This is the name exactly a few minutes ago and I successfully communicated, then this step can be successfully resolved.


⑥ If step ⑤ does not succeed, it queries the WINS server (is the server that corresponds to the NetBIOS name and IP address)


⑦ If the ⑥ step is not successful, then the client will be broadcast to find


⑧ If the ⑦ step is not successful, then the client will read the Lmhosts file (and the same directory as the Hosts file, the same writing)


If the eighth step has not been resolved successfully, then the resolution is declared unsuccessful, then the target computer can not communicate. As long as there is one step in these eight steps to resolve the success, you can successfully communicate with the target computer.


Take a look at the screenshot of the Image capture package:

Linux virtual machine test, using the command wget www.linux178.com to request, found that the direct use of Chrome browser request, interference request more, so use the wget command to request, However, using the wget command can only index.html request back, and will not be included in the index.html static resources (JS, CSS and other files) to request.


Grab Bag Analysis:


①, this is the virtual machine on the radio, To get 192.168.100.254 (that is, the gateway) MAC address, because the LAN communication depends on the MAC address, why it needs to communicate with the gateway because our DNS server IP is peripheral IP, to go out must rely on gateways to help us out.

②, this is the gateway received the virtual machine after the broadcast, response to the virtual machine response to the virtual machine to tell their own MAC address, so the client found the route exit.


③, this package is the wget command to the system configured DNS server to propose domain name resolution request (precisely should be wget initiated a DNS resolution system call), the request domain www.linux178.com, Expect to get the IP6 address (AAAA represents the IPV6 address)

④ packet, this DNS server to the system response, it is clear that the current use of IPv6 is still very few, so the AAAA record of the

⑤, this is still a request to resolve the IPV6 address, but www.linux178.com.leo.com this hostname is not there, so the result is no such name


⑥, this is the requested domain name corresponding to the IPV4 address (a record)

⑦ packet, DNS server whether it is from the cache, or iterative query finally got the IP address of the domain name, response to the system, the system gave the wget command, Wget then got the IP address of www.linux178.com, and here you can see that the client and the local DNS server are recursive queries (that is, the server must give the client a result) This can start the next step, TCP three handshake.


2. Initiate TCP's 3 times handshake


After you get the IP address of the domain name, user-agent (generally refers to the browser) will initiate TCP connection requests to the server's Web program (commonly used Httpd,nginx, etc.) with a random port (1024 < port < 65535). This connection request (the original HTTP request passes through the layer-layer envelope of the TCP/IP4 layer model) arrives at the server side (in this intermediate through various routing devices, outside the LAN, access to the NIC, and then into the kernel of the TCP/IP protocol stack (used to identify the connection request, unpack the package, peel off layer by layer), It is also possible to filter through the NetFilter firewall (which is a kernel module), eventually reaching the Web program (this article takes Nginx as an example) and eventually establishes a TCP/IP connection.

The following figure:

1 The client first sends a connection test, ack=0 indicates that the confirmation number is invalid, SYN = 1 means that this is a connection request or connection acceptance message, and that the datagram cannot carry data, seq = x represents the client's own initial sequence number (seq = 0 Represents this is package No. 0), Waiting for the client to enter the Syn_sent state, which means that clients wait for the server to reply

2 after the server hears the connection request message, if agrees to establish the connection, sends the confirmation to the client. The SYN and ACK in the TCP message header are set to 1, ack = x + 1 means expecting to receive the first byte ordinal of the next segment of the message is x+1, indicating that all data for X has been received correctly (Ack=1 is actually ack=0+1, which is the 1th package expected by the client), seq = Y represents the server's own initial sequence number (Seq=0 represents this is the NO. 0 package issued by the server side). At this point the server entered the SYN_RCVD, indicating that the server has received the client's connection request, waiting for client confirmation.

3 after receiving the confirmation, the client needs to send the confirmation again and carry the data to be sent to the server. ACK 1 indicates that the confirmation number ack= y + 1 is valid (the representative expects to receive the 1th packet of the server), and the client's own serial number seq= X + 1 (indicating that this is my 1th package, relative to the No. 0 packet), once the client is confirmed, This TCP connection enters the established state, and you can initiate an HTTP request.

Watch Grab screenshot:


⑨, this is the corresponding step 1.

⑩ number Package This corresponds to step 2 above)

Number package This corresponds to step 3 above)


Why does TCP need to shake hands 3 times?


As an example:


Suppose a foreigner is lost in the Palace Museum, see Xiaoming, so there is the following dialogue:


Foreigner: Excuse Me,can you Speak 中文版?

Xiaoming: Yes.

Foreigner: Ok,i Want ...


Before asking the way, the foreigner asked Xiaoming to speak English first, Xiaoming replied yes, then the foreigner began to ask the way


2 computer communication is by the protocol (the current popular TCP/IP protocol) to achieve, if 2 computers use the same protocol, it is not able to communicate, so this 3 handshake is equivalent to the temptation to follow the TCP/IP protocol, after the consultation is completed can communicate, Of course, this understanding is not so accurate.


Why HTTP protocols are implemented based on TCP.


At present, all the transmission in the Internet is done through TCP/IP, the HTTP protocol as the application layer protocol in the TCP/IP model is no exception, TCP is an end-to-end reliable connection-oriented protocol, so HTTP is based on the transport-layer TCP protocol without worrying about the data transmission problems.


3. Initiate an HTTP request after establishing a TCP connection


After TCP3 handshake, the browser initiates the HTTP request (see package), uses the HTTP method get method, the requested URL is/, the protocol is http/1.0


The details of package 12th are as follows:


The above message is the HTTP request message.


So what is the format of HTTP request messages and response messages?


Start line: such as get/http/1.0 (protocol used for URL requests requested by the requested method)

Header information: The value of the user-agent host, etc.

Subject


Both the request message and the response message follow the above format.



So what kinds of request methods are there in the Start row?


Get: Full Request for a resource (commonly used)

Head: Only Request response header

POST: Submitting forms (commonly used)

Put: (WebDAV) uploading files (but the method is not supported by browsers)

Delete: (WebDAV) deleting

Options: Method that returns the method supported by the requested resource

TRACE: Pursues the proxy passed through the middle of a resource request (this method cannot be emitted by the browser)


So what is a URL, URI, URN.


URI Uniform Resource Identifier Uniform Resource Identifier

URL Uniform Resource Locator Uniform Resource Locator

The format is as follows: Scheme://[username:password@]host:port/path/to/source

Http://www.magedu.com/downloads/nginx-1.5.tar.gz


URN Uniform Resource name Uniform Resource Names


Both the URL and the urn belong to the URI


For convenience, both the URL and the URI are temporarily referred to as a thing



What kinds of protocols are requested.


There are several:


Http/0.9:stateless

Http/1.0:mime, keep-alive (stay connected), caching

http/1.1: More request methods, finer cache control, persistent connections (persistent connection) more commonly used


The following is the HTTP request message header message from the Chrome launch



which

Accept is to tell the server side that I accept those MIME types

Accept-encoding this looks like a file that accepts those compression patterns.

Accept-lanague tells the server what languages to send

Connection tells the server to support Keep-alive features

Cookies carry cookies on each request to facilitate server-side identification of the same client

Host is used to identify the virtual hosts on the requesting server, such as Nginx can define many virtual hosts

That's what this is used to identify the virtual host to access.

User-agent User Agent, the general situation is the browser, there are other types, such as: wget Curl search engine spiders and so on



Condition Request Header:

If-modified-since is the browser to the server to ask a resource file if ever modified, then send it back to me, so that the server-side resources

When the file is updated, the browser requests it again, instead of using the file in the cache

Security Request Header:

Authorization: The authentication information provided by the client to the server;


What is MIME.


MIME (Multipurpose Internet Mail extesions Multipurpose Internet Messaging extension) is an Internet standard that expands the e-mail standard to support multiple formats of mail messages, such as non-ASCII characters, binary format attachments, and so on. This standard is defined in RFC 2045, RFC 2046, RFC 2047, RFC 2048, RFC 2049, and so on. RFC 2822, which has been transformed by RFC 822, stipulates that e-mail standards do not allow characters other than the 7-bit ASCII character set to be used in mail messages. Because of this, some non-English character messages and binary files, images, sounds, and other non text messages can not be transmitted in e-mail messages. MIME provides a symbolic method for representing a wide variety of data types. In addition, MIME frameworks are used in HTTP protocols used in the World Wide Web, and standards are extended to Internet media types.


MIME follows the following format: Major/minor main type/secondary type for example: 1 2 3 4 5 image/jpg image/gif text/html video/quicktime appliation/x-httpd-php


4. Server-side response to HTTP request, browser to get HTML code


Look at the figure 12th package is the HTTP request package, the 32nd packet is the HTTP response pack


When the server-side Web program receives an HTTP request, it starts processing the request and returns it to the browser's HTML file after processing.



Package 32nd is the server returned to the client HTTP Response Pack (the MIME type of the response is text/html), on behalf of this time the client-initiated HTTP request has successfully responded. 200 represents the status code for the success of the response, as well as the other status codes as follows:


1XX: Informational Status code

100, 101

2XX: Success Status Code

200:ok

3xx: Redirect Status code

301: Permanent redirection, the value of the location response header is still the current URL, and therefore is a hidden redirect;

302: Temporary redirection, explicit redirection, location response header value is the new URL

304:not Modified not modified, such as local cached resource files and server comparisons, found not modified, the server returned a 304 status code,

Tell the browser that you don't have to ask for this resource to use the local resources directly.

4XX: Client Error status code

404:not Found requested URL resource does not exist

5XX: Server-side Error status code

500:internal Server error Server Internal error

502:bad Gateway Front Proxy server does not contact the backend server

504:gateway Timeout This is the agent can contact the backend server, but the backend server in the specified time not to the proxy server response


The response header information you see in the Chrome browser:



Connection Use keep-alive Features

Content-encoding resource compression using the Gzip method

Content-type MIME type is an HTML type, and the character set is UTF-8

Date of response of date

Web server used by server

Transfer-encoding:chunked Block transfer coding is a data transmission mechanism in HTTP that allows HTTP to be sent by a Web server to a client application (usually a Web browser) that can be divided into several parts, Block transfer encoding is provided only in HTTP Protocol version 1.1 (http/1.1)

Vary This can be referred to (http://blog.csdn.net/tenfyguo/article/details/5939000)

X-pingback Reference (http://blog.sina.com.cn/s/blog_bb80041c0101fmfz.html)


That is how the server side received the HTTP request is how to generate HTML files.


Suppose the server side uses the nginx+php (FASTCGI) architecture to provide services


①nginx Read configuration file


We entered in the browser's address bar is http://www.linux178.com (http://can not input, the browser will automatically help us add), in fact, the complete should be http://www.linux178.com./ There is a point behind (this point is the root domain, usually we do not have to input, do not show), the back/is not added, the browser will automatically help us add (and look at the 3rd of the map inside the URL), then the actual request URL is http://www.linux178.com/, All right, then. Nginx when receiving the browser get/request, will read the HTTP request inside the header information, according to host to match all of their own virtual host configuration file server_name, see if there is a match, then read the configuration of the virtual host, found the following configuration:


1 Root/web/echo


This is the directory where all the Web files are found in this directory is/when we http://www.linux178.com/access to the files under this directory, such as access to http://www.linux178.com/index.html, Then there's a file called Index.html on behalf of/web/echo.


1 index index.html index.htm index.php


Through this will be able to know the site's first file is that file, that is, we are in the http://www.linux178.com/, Nginx will automatically help us put index.html (assuming that the home page is index.php of course will try to find the file, If you do not find the file and then look down, if the 3 files are not found, then throw a 404 error to add to the back, then add the URL is/index.php, and then processed according to the following configuration


1 2 3 4 5 6 7 Location ~ *\.php (\/.*) *$ {root/web/echo;     Fastcgi_pass 127.0.0.1:9000;     Fastcgi_index index.php;     Astcgi_param script_filename $document _root$fastcgi_script_name; Include Fastcgi_params; }

This section of the configuration indicates that the matching of the URL of the request (here is a regular expression is enabled for matching) *.php suffix (followed by the parameters) to the backend fastcgi process for processing.


② the PHP file to fastcgi process to deal with


So Nginx put/ index.php this URL to the back-end of the fastcgi process processing, waiting for fastcgi processing completed (combined with database query data, fill the template to generate HTML files) return to nginx a index.html document, Nginx return this index.html to the browser, in is the browser to get the first page of the HTML code, while Nginx write an access log to log files.


Note 1:nginx is how to find index.php file.


When Nginx discovers that a/web/echo/index.php file is needed, it initiates an IO system call to the kernel (because the hardware here is the hard disk, which is usually required to operate on the kernel, and the kernel provides these functions through system calls), telling the kernel, I need this file, the kernel from/start to find the Web directory, and then in the Web directory to find the Echo directory, Finally in the Echo directory to find the index.php file, so the index.php from the hard disk read to the kernel of the memory space, and then copy the file to the Nginx process in the memory space, so nginx get the file they want.


Note 2: Find out how the file is operating at the file system level.


Like Nginx need to get/web/echo/index.php this file


Each partition (such as a ext3 ext3 file system, block blocks are the smallest unit of file storage. The default is 4096 bytes, each containing metadata and data areas, each file has metadata entries (typically 128 bytes in size) in the metadata area, and each entry has a number, We call the Inode (Index node index nodes), which contains the file type, permissions, number of connections, ID of the owner and array, timestamp, and the file occupies the number of blocks, which is the block, each file can occupy multiple chunks, and block is not necessarily contiguous, each block is numbered, as shown in the following illustration:



There is also an important point: The directory is actually a file, also need to occupy disk block, directory is not a container. You see the default created directory is 4096 bytes, it is said to occupy only one disk block, but this is not certain. So to find the directory is also need to find the corresponding entries in the metadata area, only to find the corresponding inode can find the disk block occupied by the directory.


What exactly is stored in the directory, is not a file or other directory.


In fact, there is a list of such a table (so to understand), which contains the directory or file name and the corresponding inode number (temporarily referred to as mapping table), the following figure:


Assume

/In the data area occupies 1, 2nd block,/is actually a directory with 3 directory Web 111

Web occupies 5th block is a directory with 2 directory echo data

Echo occupies block 11th, which contains 1 files in the directory index.php

Index.php occupies 15, 16th, block is a file.


It is distributed in the file system as shown in the following figure



So how does the kernel actually find index.php this file?


After the kernel gets the Nginx IO system call to get/web/echo/index.php this file request


The ① kernel reads the metadata area/inode, reads/matches the data block number from the Inode, then finds its corresponding block (1 2nd block) in the data area, reads the mapping table on block 1th to find the web name in the metadata area corresponding to the inode number

② kernel reads the web corresponding inode (number 3rd), from which the web in the data area corresponding block is 5th block, so go to the data area to find 5th block, read the map, know echo corresponding Inode is 5th, and then to the metadata area to find the number 5th Inode

The ③ kernel reads the 5th inode, gets echo in the data area corresponding is 11th block, then reads the 11th block to the data area to obtain the mapping table, obtains index.php the corresponding Inode is 9th number

④ kernel to the metadata to read the 9th inode, get index.php corresponding to the 15 and 16th data block, and then to the data area to find 15 16th blocks, read the contents, get index.php complete content


5. The browser parses the HTML code and requests resources in the HTML code


Browser to get index.html file, it began to parse the HTML code, encountered js/css/image and other static resources, to the server to request download (will use multithreading download, each browser thread number is different), this time on the use of keep-alive features , establish an HTTP connection, you can request multiple resources, the order of downloading resources is in the order of the code, but because each resource size is different, and the browser and multiple threads request resources, so the following figure shows that the order shown here is not necessarily the order inside the code.


When a browser requests a static resource (in case it does not expire), it initiates an HTTP request to the server (asking if the resource has been modified since the last modification time) and if the server side returns a 304 status code (tells the browser that the server has not been modified), The browser then reads the cached file for the local resource directly.


Detailed browser working principle please see: http://kb.cnblogs.com/page/129756/



6. The browser renders the page rendering to the user


Finally, the browser uses its own internal working mechanism to render the requested static resources and HTML code for rendering to the user.



The complete HTTP transaction declaration has been completed since this time.


Turn from: http://linux5588.blog.51cto.com/65280/1351007

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.