What is the process of a complete HTTP transaction?

Source: Internet
Author: User
Tags ack error status code ming rfc

What is the process of a complete HTTP transaction?

statement: The statement in this article is only a summary of personal understanding, not necessarily completely correct, but can help to understand.
For the HTTP protocol, refer to the following:
HTTP protocol ramble on Http://kb.cnblogs.com/page/140611/HTTP Protocol Overview http://www.cnblogs.com/vamei/archive/2013/05/11/3069788. HTML understands all aspects of HTTP headers http://kb.cnblogs.com/page/55442/
Thinking?

When we enter www.linux178.com in the address bar of the browser, and then go to enter, enter this moment to see what happened to the page?

Simple analysis!
Domain name resolution initiates TCP 3 handshake after establishing a TCP connection initiates an HTTP request server responding to an HTTP request, the browser obtains HTML code browser parsing HTML code, and requests the HTML code resources (such as JS, CSS, pictures, etc.) browser to render the page rendering to the user

Here's an analysis of the process above, and we'll take the Chrome browser as an example:

First, the Domain name analysis

First, the Chrome browser will resolve the IP address of the www.linux178.com domain name (the exact name should be the hostname). How to resolve to the corresponding IP address?

1Chrome will first search the browser's own DNS cache (the cache time is relatively short, about 1 minutes, and can only hold 1000 cache), to see if their own cache has www.linux178.com corresponding entries, and has not expired,     Resolves to this end if there is no expiration. Note: How do we view Chrome's own cache? You can use Chrome:net-internals/#DNS for viewing2if the browser itself does not find the corresponding entry in the cache, then Chrome will search the operating system's own DNS cache, if found and not expired, stop the search resolution to this end. Note: How to view the DNS cache of the operating system itself, take Windows system as an example, you can use ipconfig at the command line/Displaydns to check it out.3if the DNS cache on the Windows system is not found, try reading the Hosts file (located in C:\Windows\System32\drivers\etc) to see if there is no IP address for that domain name, and if so, the resolution succeeds. 4if the corresponding entry is not found in the Hosts file, the browser initiates a DNS system call to the locally configured preferred DNS server (typically provided by the telco operator, You can also use a DNS server like Google to initiate a domain name resolution request (through a UDP protocol that initiates a request to DNS port 53, which is a recursive request that the carrier's DNS server must provide us with the IP address of the domain name), The operator's DNS server first finds its own cache, finds the corresponding entry, and does not expire, and the resolution succeeds. If the corresponding entry is not found, then there is a carrier's DNS for our browser to initiate an iterative DNS resolution request, it is to find the root domain of the DNS IP address (this DNS server is built in 13 root domain DNS IP address), find the root domain of the DNS address, will make a request to it (ask www.linux178.com the IP address of this domain name AH?) ), root domain found this is a domain name of a top-level domain COM domain, so tell the carrier's DNS I do not know the IP address of this domain name, but I know the IP address of the COM domain, you go to find it, so the operator's DNS to get the IP address of the COM domain, Another request to the IP address of the COM domain (what is the IP address of this domain name www.linux178.com?), COM domain This server tells the operator of the DNS I do not know www.linux178.com the IP address of this domain name, but I know linux178.com this domain DNS address, you go to find it, so the operator's DNS and to linux178.com the DNS address of this domain name (this is generally By a domain name registrar, such as WAN Network, new network, etc.) to initiate the request (please www.linux178.com the IP address of this domain name is how much?) ), This time linux178.com domain DNS Server A check, eh, really in my place, so the results of the found sent to the operator's DNS server, this time the operator's DNS server got www.linux178.com the domain name corresponding IP address, and returned to the Windows system kernel, The kernel also returns the result to the browser, finally the browser to get the www.linux178.com corresponding IP address, the action of one step. Note: In general, the following steps will not be performed if the above 4 steps are not resolved successfully, then the following steps are performed:5The operating system will look for NetBIOS name cache (NetBIOS names caching, which exists on the client computer), what is the cache? The computer name and IP address of the computer that I have successfully communicated with in the recent period of time will exist in this cache. Under what circumstances can the step be resolved successfully? This is the name just a few minutes ago and I successfully communicated, then this step can be successfully resolved. 6If the 5th step is unsuccessful, the WINS server is queried (the server that corresponds to the NetBIOS name and IP address)7if the 6th step is not successfully queried, then the client is going to broadcast the search8If the 7th step is not successful, then the client will read the Lmhosts file (and the same directory as the Hosts file, the same way) if the eighth step has not been resolved successfully, then declared this resolution failed, it will not be able to communicate with the target computer. As long as there is one step in these eight steps to resolve the success, you can successfully communicate with the target computer. 
View Code

See Grab Bag:

Linux virtual machine test, using the command wget www.linux178.com to request, found the direct use of Chrome browser request, the interference request is more, so use the wget command to request, However, using the wget command only returns the index.html request, and does not request the static resources (JS, CSS, etc.) contained in the index.html.

Packet Capture Analysis:
Package 1th, this is the virtual machine in the broadcast, to get 192.168.100.254 (that is,the gateway) MAC address, because the local area network communication depends on the MAC address, why it needs to communicate with the gateway is because our DNS server IP is the perimeter IP, To go out must rely on the gateway to help us out.  Package 2, this is the gateway received the virtual machine after the broadcast, the response to the virtual machine response, tell the virtual machine's own MAC address, so the client found the route exit.   Package 3, this package is the wget command to the system configuration of the DNS server to make a domain name resolution request (exactly should be wget initiated a DNS resolution of the system call), the requested domain name www.linux178.com, Expect to get the address (AAAA represents the IPV6 address)4 package, this DNS server to the system response, it is clear that the current use of IPv6 or very few, so the AAAA record of the 5 package, This is still requested to resolve the IPV6 address, but www.linux178.com.leo.com this hostname is not exist, so the result is no such name6 package, this is the requested domain name corresponding to the IPV4 address (a record)  Package 7, the DNS server, whether it is from the cache, or iterative query finally got the IP address of the domain name, response to the system, the system gave the wget command, Wget then got the www.linux178.com IP address, it can also be seen here that the client and the local DNS server is a recursive query (that is, the server must give the client a result) This can start the next step, the TCP three handshake. 
Packet Capture Analysis

Ii. three requests to initiate TCP

After receiving the IP address of the domain name, user-agent (typically the browser) initiates a TCP connection request to the server's Web program (usually Httpd,nginx, etc.) with a random port (< port < 65535). This connection request (the original HTTP request passes through the layer layer of the TCP/IP4 layer model) arrives at the server side (this intermediate through various routing devices, except inside the LAN), enters to the network card, then enters into the kernel TCP/IP protocol stack (used to identify the connection request, unpack the packet, a layer of peel off), It is also possible to pass the filtering of the NetFilter firewall (which is the kernel module) and finally arrive at the Web program (Nginx for example) and finally establish a TCP/IP connection. Such as:

1) The client first sends a connection heuristic, ack=0 indicates that the confirmation number is invalid, and SYN = 1 indicates that this is a connection request or a connection acceptance message, and that the datagram cannot carry data.
Seq = x Represents the client's own initial sequence number (seq = 0 means this is the No. 0 packet), when the client enters the Syn_sent state, indicating that the clients wait for the server's reply
2) When the server supervisor hears the connection request message, if it agrees to establish the connection, it sends the acknowledgement to the client. The SYN and ACK in the TCP header is set to 1, and ack = x + 1 indicates that the first data byte ordinal that is expected to receive the next segment of the message is x+1,
It also indicates that all data up to X has been received correctly (Ack=1 is actually ack=0+1, which is the 1th packet expected from the client), and seq = y represents the server's own initial sequence number (seq=0 means this is the No. 0 package issued by the server).
The server then enters SYN_RCVD, indicating that the server has received a connection request from the client and waits for client confirmation.
3) When the client receives confirmation, it also needs to send the confirmation again, carrying the data to be sent to the server. An ACK of 1 indicates that the confirmation number ack= y + 1 is valid (represents the 1th packet expected to receive the server),
The client's own serial number seq= X + 1 (indicating that this is my 1th packet, as opposed to the No. 0 package), once the client's acknowledgement is received, the TCP connection enters the established state and the HTTP request can be initiated.

See Grab Bag:

Package 9th This is the corresponding above step 1) 10th package This corresponds to the above steps 2) 11th package This corresponds to the above step 3)
Why does TCP need to shake 3 times? As an example:

Suppose a foreigner lost in the Forbidden City, see Xiao Ming, so there is the following dialogue:

Foreigner: Excuse Me,can you Speak Chinese? Xiaoming: Yes. Foreigner: Ok,i Want ...

Before asking the way, the foreigner asked Xiao Ming whether he would speak English, and Xiao Ming answered yes, then the foreigner began to ask the way

2 computer communication is by the Protocol (currently popular TCP/IP protocol) to achieve, if 2 computers use a different protocol, it is not able to communicate, so this 3-time handshake is equivalent to testing whether the other party follows the TCP/IP protocol, the negotiation is completed after the communication can be done, Of course this understanding is not so accurate.

Why does the HTTP protocol be implemented on TCP?

Currently all the traffic in the Internet through TCP/IP, HTTP protocol as the TCP/IP model Application layer protocol is no exception, TCP is an end-to-end reliable connection-oriented protocol, so HTTP based on the Transport Layer TCP protocol without worrying about the various problems of data transmission.

Iii. initiating an HTTP request after establishing a TCP connection

After TCP3 the handshake, the browser initiates an HTTP request (see page package), uses the method of HTTP GET method, the URL of the request is/, the protocol is http/1.0

The above message is an HTTP request message

So what is the format of the HTTP request message and the response message?
Starting line: Header information such as get/http/1.0 (  protocol used by URL request of Request method request): User-Agent  host, etc.

Both the request message and the response message will follow the above format.

So what are the request methods in the start line?

GET: Complete Request for a resource (common) HEAD: Request only response header post: Submit Form  (Common) PUT: (WebDAV) upload delete: (WebDAV) Delete options: Returns the method TRACE for the method supported by the requested resource: The agent that pursues a resource request in the middle

What is a URL, URI, URN?
URI  Uniform Resource Identifier Uniform Resource Identifier URL  Uniform Resource Locator Uniform Resource Locator format is as follows:  scheme://[username: password@]host:port/path/to/Source           http://www.magedu.com/downloads/nginx-1.5. Tar.gzurn  Uniform Resource Name the Uniform Resource name URL and urn all belong to the URI in order to facilitate the URL and the URI for the moment to refer to a thing

What kinds of protocols are requested?

There are several types of the following:

http/0.9: Statelesshttp/1.0:mime, keep-Alive (keep connected), cache HTTP/1.1: More request method, finer cache control, persistent connection (persistent Connection) more commonly used

Common header

Accept  is to tell the server side, I accept those MIME typeAccept-Encoding  This appears to accept those compressed way fileAccept-lanague   Tell the server which languages to send Connection       tell the server to support keep-alive feature cookies           carry cookies on each request to facilitate server-side identification of the same client host             used to identify the virtual host on the request server, such as nginx can define a number of virtual hosts                 that is used to identify the virtual host to access. User-agent       , the general situation is the browser, there are other types, such as: wget curl search engine spider and other     conditional request header: If-modified-Since is the browser to the server to ask for a resource file if it has been modified since when, then resend to me, so as to ensure that the server-side resource             files are updated, the browser requests again, instead of using the cache file security request header: Authorization: The authentication information provided by the client to the server;

What is MIME?

MIME (Multipurpose Internet Mail extesions Multipurpose Internet Message extension) is an Internet standard that expands the e-mail standard to support mail messages in a variety of formats, such as non-ASCII characters, binary format attachments, and so on. This standard is defined in RFC 2045, RFC 2046, RFC 2047, RFC 2048, RFC 2049, and so on. RFC 2822, which is a transition from RFC 822, stipulates that e-mail standards do not allow the use of characters other than the 7-bit ASCII character set in mail messages. Because of this, some non-English character messages and binary files, images, sounds and other non-text messages cannot be transmitted in e-mail. MIME defines a symbolic method for representing a wide variety of data types. In addition, a MIME framework is used in the HTTP protocol used in the World Wide Web, and the standard is extended to the Internet media type.

MIME follows the following format: Major/minor main type/Sub-type for example:

image/jpgimage/giftext/htmlvideo/quicktimeappliation/x-httpd-php
Iv. Server-side response HTTP request, the browser gets HTML code

Look, package 12th is the HTTP request packet, and the 32nd packet is the HTTP response packet.

After the server-side Web program receives the HTTP request, it begins processing the request and returns it to the browser HTML file after processing.

Package 32nd is the server's return to the client HTTP response package (the MIME type of the $ OK response is text/html), which represents the successful response of the client-initiated HTTP request. 200 represents the status code of the response success, and there are other status codes as follows:

Common Status Code information
1xx: Informational Status Code      1012xx: Success Status code      £ok3xx: Redirect Status code      301: Permanent Redirect, location response header value is still the current URL, so hidden redirection;      302: Temporary redirect, explicit redirect, location response header value is new URL     304: Not Modified unmodified  , such as local cache resource file and comparison on server, found not modified The server returns a 304 status code that                         tells the browser that you do not have to request the resource and use the local resources directly. 4XX: Client Error status code      404: Not Found  requested URL resource does not exist 5xx: server-side error status     code : Internal Server  Err Server Internal error      502: 504:gateway Timeout  occurs     when bad Gateway does not contact the backend server in front of the proxy server  This is the server that the agent can contact to the backend, but the backend server does not respond to the proxy server within the specified time

Response header information that you see in Chrome browser:

Connection            using keep-Alive features content-Encoding      using gzip to compress resources content-type          The MIME type is an HTML type, and the character set is the UTF-8                  Date response server                used by the Web server transfer-encoding:chunked   chunked transfer encoding is a data transfer mechanism in HTTP that allows HTTP to be sent from a Web server to a client application (typically a Web browser) that can be divided into multiple parts,
Chunked transfer encoding is available only in the HTTP protocol 1.1 version (http/1.1) ( http://blog.csdn.net/tenfyguo/article/details/5939000) for vary this reference

So what happens when the server receives an HTTP request and generates an HTML file?

Suppose the server side uses the nginx+php (FASTCGI) architecture to provide services

1 nginx Read configuration file

We enter in the address bar of the browser is http://www.linux178.com (HTTP//Can not input, the browser will automatically help us to add), in fact, the complete should be http://www.linux178.com./ There is a point in the back (this point is the root domain, usually we do not input, nor display), the latter/also do not add, the browser will automatically help us to add (and see the 3rd image inside the URL), then the actual request URL is http://www.linux178.com/, Okay, then. Nginx receives the browser get/request, will read the HTTP request inside the header information, according to host to match all of its own virtual host configuration file server_name, to see if there is no match, then read the configuration of the virtual host, found the following configuration:

Root/web/echo

Through this we know all the Web files in this directory is the directory is/when we http://www.linux178.com/access to the directory under the file, such as access to http://www.linux178.com/index.html, Then there's a file under/web/echo, called Index.html.

Index index.html index.htm index.php

Through this will be able to know the website home file is that file, that is, we are in http://www.linux178.com/, Nginx will automatically help us to index.html (assuming the home is index.php of course will try to find the file, If the file is not found, then look down, if the 3 files are not found, then throw a 404 error) added to the back, then add the URL is/index.php, and then according to the following configuration for processing

Location ~. *\.php (\/.*) *$ {   /web/Echo;   Fastcgi_pass   127.0.0.1:9000;   Fastcgi_index  index.php;   Astcgi_param  script_filename  $document _root$fastcgi_script_name;   Include        Fastcgi_params;}

This configuration indicates that any matching in the requested URL (where regular expression is enabled to match) *.php suffix (followed by arguments) is given to the backend's fastcgi process for processing.

2 Give the PHP file to the fastcgi process to handle

So Nginx put/ index.php this URL to the back end of the fastcgi process processing, waiting for fastcgi processing completed (combined with database query data, populate template generated HTML file) returned to nginx a index.html document, Nginx then return this index.html to the browser, in is the browser to get the first page of the HTML code, while Nginx write an access log to the log file.

1): How is nginx looking for index.php files?

When Nginx discovers that a/web/echo/index.php file is required, it initiates an IO system call to the kernel (because the hardware is meant to deal with the hardware, which is the hard disk, which usually needs to be operated by the kernel).
The kernel provides these functions through the system call to implement, tell the kernel, I need this file, the kernel from/start to find the Web directory, and then found in the Web directory echo directory, finally found in the Echo directory index.php file,
So the index.php from the hard disk to read to the kernel's own memory space, and then copy the file to the Nginx process in the memory space, so nginx get the files they want.

2): How do I find files at the filesystem level?

For example, Nginx needs to get/web/echo/index.php this file per partition (such as ext3 ext3 file system, block block is the smallest unit of file storage default is 4096 bytes) contains the metadata area and data area, Each file has a metadata entry (typically 128 bytes in size) in the metadata area.
Each entry has a number, which we call the Inode (index node), which contains the file type, permissions, number of connections, ID of the owner and array, timestamp,
This file occupies the block, which is the number of blocks (block, each file can occupy more than one block, and the block is not necessarily contiguous, each block is numbered),
As shown in the following:

Another important point: The directory is also common is the file, also need to occupy disk block, directory is not a container. You see, by default, the directory created is 4096 bytes, which means that only one disk block is required, but this is indeterminate.
So to find the directory also need to find the corresponding entry in the metadata area, only to find the corresponding inode can find the disk block occupied by the directory. So what's in the directory, isn't it a file or any other directory? In fact, the directory has such a table (so to understand), which contains the directory or file name and the corresponding inode number (temporarily referred to as the mapping table), such as:

Assume

/           in the data area occupies 1, 2nd block,/In fact, there are 3 directories in the directory of the  Web 111Web         occupies 5th block  is a directory with 2 directories echo Dataecho        occupies 11th number Block  is a directory  with 1 files index.phpindex.php   15 16th block is a  file

It is distributed in the file system as shown in

So how did the kernel find index.php this file?

The kernel gets the Nginx IO system call to get/web/echo/index.php after this file request

The 1 kernel reads the inode of the metadata area/the number of the corresponding data block from the Inode, and then finds its corresponding block (1 2nd block) in the data area and reads the mapping table on block 1th to find the inode number of the web name in the metadata area.
2 kernel read the web corresponding to the Inode (3rd), the Web in the data area corresponding block is 5th block, and then to the data area to find the 5th block, from which to read the mapping table, the echo corresponding to the inode is 5th, so to the metadata area to find the number 5th Inode
3 kernel read 5th inode, get echo in the data area corresponding to the 11th block, so to the data area read 11th block to get the mapping table, get index.php corresponding Inode is 9th number
4 The kernel to the metadata area to read the number 9th inode, the index.php corresponding to the 15 and 16th data blocks, then to the data area to find 15 16th block, read the contents of the index.php, get the full content

Five. The browser parses the HTML code and requests the resources in the HTML code

When the browser gets the index.html file, it begins parsing the HTML code, and when it encounters static resources such as Js/css/image, it goes to the server to request the download (using multi-threaded download, the number of threads per browser is different), this time using the Keep-alive feature , to establish an HTTP connection, you can request multiple resources, the order of downloading resources is in the order of the code, but because each resource size is different, and the browser and multi-threaded request resources, so from the point of view, the order shown here is not necessarily the order in the code.

When a browser requests a static resource (without expiring), it initiates an HTTP request to the server (asking whether the resource has been modified since the last modification time), and if the server side returns a 304 status code (which tells the browser that the server side has not been modified), Then the browser will directly read the local cache file for that resource.

Detailed browser How to work see: http://kb.cnblogs.com/page/129756/

Six. The browser renders the page rendering to the user

Finally, the browser makes use of its internal working mechanism, renders the requested static resource and HTML code, renders it to the user after rendering.

The complete HTTP transaction declaration has been completed since this time.

Disclaimer: This article from the operation of the tribe, has been the author agreed to reprint, if required to reprint please indicate this article from the operation and maintenance tribe

What is the process of a complete HTTP transaction?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.