To turn a full HTTP request

Source: Internet
Author: User
Tags error status code ming rfc

    • Spartacus
    • Time: January 11, 2014
    • Category: WEB
声明:本文章中的说法仅是个人理解总结,不一定完全正确,但是可以有助于理解。

For the HTTP protocol, refer to the following:

HTTP协议漫谈  http://kb.cnblogs.com/page/140611/HTTP协议概览 http://www.cnblogs.com/vamei/archive/2013/05/11/3069788.html了解HTTP Headers的方方面面 http://kb.cnblogs.com/page/55442/

When we enter www.linux178.com in the address bar of the browser, and then go to enter, enter this moment to see what happened to the page?

Domain name resolution--Initiates a TCP 3 handshake--initiates an HTTP request after the TCP connection is established--the server responds to the HTTP request, the browser gets the HTML code--the browser parses the HTML code, and requests the resources in the HTML code (such as JS, CSS , pictures, etc.)-browser renders the page to the user

Here's an analysis of the process above, and we'll take the Chrome browser as an example:

I. Domain name resolution

First, the Chrome browser will resolve the IP address of the www.linux178.com domain name (the exact name should be the hostname). How to resolve to the corresponding IP address?

1 The Chrome browser will first search the browser's own DNS cache (the cache time is relatively short, probably only1 minutes, and can only accommodate1000 cache) To see if there is a www in its own cache. linux178. com entries, and no expiration, resolves to this end if there is no expiration. Note: How do we view Chrome's own cache? You can use chrome://net-internals/#dns to view2 If the browser itself does not find the corresponding entry in the cache, then Chrome will search the operating system's own DNS cache, if found and not expired, stop the search resolution to this end. Note: How to view the DNS cache of the operating system itself, take Windows system as an example, can be viewed at the command line using Ipconfig/displaydns3 If the DNS cache on the Windows system is not found, then try to read the Hosts file (located in C:\Windows\System32\drivers\etc) to see if there is no IP address for that domain name, if any, the resolution succeeds.4 if the corresponding entry is not found in the Hosts file, the browser initiates a DNS system call to the locally configured preferred DNS server (typically provided by the telco operator, You can also use DNS servers like those provided by Google to initiate a domain name resolution request (through the UDP protocol to the DNS53 Port initiating the request, this request is a recursive request, that is, the operator's DNS server must provide us with the IP address of the domain name, the operator's DNS server first to find its own cache, find the corresponding entry, and did not expire, the resolution is successful. If the corresponding entry is not found, then there is the operator's DNS generation of our browser to initiate an iterative DNS resolution request, it is first to find the root domain of the DNS IP address (this DNS server is builtIP address of the DNS of the 13 root domain), the DNS address of the root domain is called and a request is initiated (please ask www. linux178What is the IP address of the. com domain? Root domain discovers that this is a top-level domainA domain name for the COM domain, so tell the carrier's DNS I don't know the IP address of this domain name, but I knowThe IP address of the COM domain, you go to find it, so the operator's DNS getsThe IP address of the COM domain, and theThe IP address of the COM domain initiated the request (please ask www. linux178What is the IP address of the. com domain name?),COM domain This server tells the carrier's DNS I don't know www. linux178. com The IP address of this domain name, but I know linux178. com The DNS address of this domain, you go to find it, so the operator's DNS again to linux178. com The DNS address of this domain name (this is generally provided by the domain name registrar, such as WAN Network, new network, etc.) to initiate the request (please ask www. linux178What is the IP address of the. com domain? ), this time linux178.com domain DNS Server A look, eh, really in my place, so I will find the results sent to the operator's DNS server, this time the operator's DNS server got Www.linux178.com the IP address of the domain name, and returned to the Windows system kernel, the kernel returned the results to the browser, finally the browser got www< Span class= "preprocessor" >.linux178.com the corresponding IP address, it is time to move. Note: In general, the following steps will not be performed if the above  4 steps, has not been resolved successfully, then the following steps: 5 the operating system will look for NetBIOS name Cache (The NetBIOS name cache, which exists on the client computer), what is there? The computer name and IP address of the computer that I have successfully communicated with in the recent period of time will exist in this cache. Under what circumstances can the step be resolved successfully? This is the name just a few minutes ago and I successfully communicated, then this step can be successfully resolved. 6 if the  5 is not successful, it will query the WINS server (the server that corresponds to the NetBIOS name and IP address) 7 If the 8 if the 

See Grab Bag:
Linux virtual machine test, using the command wget www.linux178.com to request, found the direct use of Chrome browser request, the interference request is more, so use the wget command to request, However, using the wget command only returns the index.html request, and does not request the static resources (JS, CSS, etc.) contained in the index.html.

Packet Capture Analysis:

Package 1th, this is the virtual machine on the radio, to get192.168.100.254 (that is, the gateway) of the MAC address, because the LAN communication depends on the MAC address, why it needs to communicate with the gateway is because our DNS server IP is the perimeter IP, to go out must rely on the gateway to help us out of the line.Package 2nd, this is the gateway received the virtual machine after the broadcast, the response to the virtual machine response, tell the virtual machine's own MAC address, so the client found the route exit. Package 3rd, this package is the wget command to the system configuration of the DNS server to make a domain name resolution request (exactly should be wget initiated a DNS resolution system call), the requested domain name www. linux178. com, Expect to get the address (AAAA represents the IPV6 address)4th package, this DNS server to the system's response, it is clear that the current use of IPv6 or very few, so there is no AAAA record 5th packet, this is the request to resolve the IPV6 address, but www linux178.com. comThis hostname does not exist, so the result is no such name6th package, this is the requested domain name corresponding to the IPV4 address (a record)7th packet, The DNS server either from the cache inside, or iterative query finally got the IP address of the domain name, response to the system, the system gave the wget command, wget then got the www. linux178. com IP address, It can also be seen here that the client and the local DNS server are recursive queries (that is, the server must give the client a result) so that it can begin the next step, with a three-time handshake of TCP. 
Two. 3-time handshake to initiate TCP

After receiving the IP address of the domain name, user-agent (typically the browser) initiates a TCP connection request to the server's Web program (usually Httpd,nginx, etc.) with a random port (< port < 65535). This connection request (the original HTTP request passes through the layer layer of the TCP/IP4 layer model) arrives at the server side (this intermediate through various routing devices, except inside the LAN), enters to the network card, then enters into the kernel TCP/IP protocol stack (used to identify the connection request, unpack the packet, a layer of peel off), It is also possible to pass the filtering of the NetFilter firewall (which is the kernel module) and finally arrive at the Web program (Nginx for example) and finally establish a TCP/IP connection.

Such as:

1)The client first sends a connection temptation, ack=0Indicates that the confirmation number is invalid, SYN=1Indicates that this is a connection request or a connection acceptance message, and that the datagram cannot carry data, SEQ=XRepresents the client's own initial sequence number (seq=0This is the No. 0 package), when the client enters the Syn_sent state, indicating that clients wait for the server to reply2)After the server has heard the connection request message, if it agrees to establish a connection, it sends a confirmation to the client. Syn in the header of the TCP messageAndACK all 1Ack=X+1 indicates that the first data byte ordinal that expects to receive the next message segment is X++1,= y  means server 3) client after receiving confirmation, you also need to send the confirmation again, carrying the data to be sent to the server. ACK  place 1  means confirmation number ack= y + 1  valid (represents the 1th package expected to receive the server), the client's own serial number seq= x + 1 (this is my 1th package, as opposed to the No. 0 pack), once the client's confirmation is received, the TCP connection enters the established state , you can initiate an HTTP request.  

See Grab Bag:

9 号包 这个就是对应上面的步骤 1)10 号包 这个对应的上面的步骤 2)11 号包 这个对应的上面的步骤 3)

Why does TCP need to shake 3 times?

As an example:

Suppose a foreigner lost in the Forbidden City, see Xiao Ming, so there is the following dialogue:

...

Before asking the way, the foreigner asked Xiao Ming whether he would speak English, and Xiao Ming answered yes, then the foreigner began to ask the way

2 computer communication is by the Protocol (currently popular TCP/IP protocol) to achieve, if 2 computers use a different protocol, it is not able to communicate, so this 3-time handshake is equivalent to testing whether the other party follows the TCP/IP protocol, the negotiation is completed after the communication can be done, Of course this understanding is not so accurate.

Why does the HTTP protocol be implemented on TCP?

Currently all the traffic in the Internet through TCP/IP, HTTP protocol as the TCP/IP model Application layer protocol is no exception, TCP is an end-to-end reliable connection-oriented protocol, so HTTP based on the Transport Layer TCP protocol without worrying about the various problems of data transmission.

Three. Initiating an HTTP request after establishing a TCP connection

After TCP3 the handshake, the browser initiates an HTTP request (see page package), uses the method of HTTP GET method, the URL of the request is/, the protocol is http/1.0

The following is the detailed contents of package number 12th:

The above message is an HTTP request message.

So what is the format of the HTTP request message and the response message?

起始行:如 GET / HTTP/1.0 (请求的方法 请求的URL 请求所使用的协议)头部信息:User-Agent Host等成对出现的值主体

Both the request message and the response message will follow the above format.

So what are the request methods in the start line?

GET: 完整请求一个资源 (常用)HEAD: 仅请求响应首部POST:提交表单  (常用)PUT: (webdav) 上传 DELETE:(webdav) 删除 OPTIONS:返回请求的资源所支持的方法的方法 TRACE: 追求一个资源请求中间所经过的代理 

What is a URL, URI, URN?

URI  Uniform Resource Identifier 统一资源标识符URL  Uniform Resource Locator 统一资源定位符 格式如下:  scheme://[username:[email protected]]HOST:port/path/to/source             http://www.magedu.com/downloads/nginx-1.5.tar.gzURN  Uniform Resource Name 统一资源名称URL和URN 都属于 URI为了方便就把URL和URI暂时都通指一个东西

What kinds of protocols are requested?

There are several types of the following:

http/0.9: statelesshttp/1.0: MIME, keep-alive (保持连接), 缓存http/1.1: 更多的请求方法,更精细的缓存控制,持久连接(persistent connection) 比较常用

Here is the header of the HTTP request message from Chrome

which

Accept  就是告诉服务器端,我接受那些MIME类型Accept-Encoding  这个看起来是接受那些压缩方式的文件Accept-Lanague   告诉服务器能够发送哪些语言 Connection       告诉服务器支持keep-alive特性Cookie           每次请求时都会携带上Cookie以方便服务器端识别是否是同一个客户端Host             用来标识请求服务器上的那个虚拟主机,比如Nginx里面可以定义很多个虚拟主机                 那这里就是用来标识要访问那个虚拟主机。User-Agent       用户代理,一般情况是浏览器,也有其他类型,如:wget curl 搜索引擎的蜘蛛等     条件请求首部:If-Modified-Since 是浏览器向服务器端询问某个资源文件如果自从什么时间修改过,那么重新发给我,这样就保证服务器端资源 文件更新时,浏览器再次去请求,而不是使用缓存中的文件安全请求首部:Authorization: 客户端提供给服务器的认证信息;

What is MIME?

The MIME (Multipurpose Internet Mail extesions Multipurpose Internet Message extension) is an Internet standard that expands the e-mail standard to support mail messages in a variety of formats, such as non-ASCII characters, binary format attachments, which are defined in RFC 2045, RFC 2046, RFC 2047, RFC 2048, RFC 2049, and so on. RFC 2822, which is a transition from RFC 822, stipulates that e-mail standards do not allow the use of characters other than the 7-bit ASCII character set in mail messages. Because of this, some non-English character messages and binary files, images, sounds and other non-text messages cannot be transmitted in e-mail. MIME defines a symbolic method for representing a wide variety of data types. In addition, a MIME framework is used in the HTTP protocol used in the World Wide Web, and the standard is extended to the Internet media type.

MIME follows the following format: Major/minor main type/Sub-type for example:

image/jpgimage/giftext/htmlvideo/quicktimeappliation/x-httpd-php
Four. Server-side response HTTP request, the browser gets HTML code

Look, package 12th is the HTTP request packet, and the 32nd packet is the HTTP response packet.

After the server-side Web program receives the HTTP request, it begins processing the request and returns it to the browser HTML file after processing.

Package 32nd is the server's return to the client HTTP response package (the MIME type of the $ OK response is text/html), which represents the successful response of the client-initiated HTTP request. 200 represents the status code of the response success, and there are other status codes as follows:

1XX: Informational Status code100,1012XX: Success Status code200:Ok3XX: Redirect Status code301: Permanent Redirection,The value of the location response header is still the currentURL, so for hidden redirects;302: Temporary Redirect, explicit redirect,the value of the location response header is the new URL 304: not Modified unmodified, such as a locally cached resource file and a comparison on the server, the server returns a 304 status code that tells the browser , you do not have to request the resource and use the local resources directly. 4XX: Client error status code 404:  not Found requested URL resource does not exist 5XX: server-side Error status code : Internal Server error   502: Bad  Gateway appears when the proxy server does not contact the backend server 504:Gateway Timeout This is the server that the agent can contact to the backend, but the backend server does not respond to the proxy server within the specified time           

Response header information that you see in Chrome browser:

Connection            使用keep-alive特性Content-Encoding      使用gzip方式对资源压缩Content-type          MIME类型为html类型,字符集是 UTF-8Date 响应的日期Server 使用的WEB服务器Transfer-Encoding:chunked 分块传输编码 是http中的一种数据传输机制,允许HTTP由网页服务器发送给客户端应用(通常是网页浏览器)的数据可以分成多个部分,分块传输编码只在HTTP协议1.1版本(HTTP/1.1)中提供Vary 这个可以参考(http://blog.csdn.net/tenfyguo/article/details/5939000)X-Pingback 参考(http://blog.sina.com.cn/s/blog_bb80041c0101fmfz.html)

So what happens when the server receives an HTTP request and generates an HTML file?

Suppose the server side uses the nginx+php (FASTCGI) architecture to provide services

1 nginx Read configuration file

We enter in the address bar of the browser is http://www.linux178.com (HTTP//Can not input, the browser will automatically help us to add), in fact, the complete should be http://www.linux178.com./ There is a point in the back (this point is the root domain, usually we do not input, nor display), the latter/also do not add, the browser will automatically help us to add (and see the 3rd image inside the URL), then the actual request URL is http://www.linux178.com/, Okay, then. Nginx receives the browser get/request, will read the HTTP request inside the header information, according to host to match all of its own virtual host configuration file server_name, to see if there is no match, then read the configuration of the virtual host, found the following configuration:

root /web/echo   

Through this we know all the Web files in this directory is the directory is/when we http://www.linux178.com/access to the directory under the file, such as access to http://www.linux178.com/index.html, Then there's a file under/web/echo, called Index.html.

index index.html index.htm index.php  

Through this will be able to know the website home file is that file, that is, we are in http://www.linux178.com/, Nginx will automatically help us to index.html (assuming the home is index.php of course will try to find the file, If the file is not found, then look down, if the 3 files are not found, then throw a 404 error) added to the back, then add the URL is/index.php, and then according to the following configuration for processing

location ~ .*\.php(\/.*)*$ {   root /web/echo;   fastcgi_pass   127.0.0.1:9000; fastcgi_index index.php; astcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; include fastcgi_params;}

This configuration indicates that any matching in the requested URL (where regular expression is enabled to match) *.php suffix (followed by arguments) is given to the backend's fastcgi process for processing.

2 Give the PHP file to the fastcgi process to handle

So nginx put/index.php this URL to the back end of the fastcgi process processing, waiting for fastcgi processing completed (combined with the database query data, fill Template generated HTML file) returned to nginx a index.html document, Nginx then return this index.html to the browser, so the browser will get the HTML code of the home page, while Nginx write an access log to the log file.

Note 1:nginx How to find index.php file?

When Nginx discovers that a/web/echo/index.php file is required, it initiates an IO system call to the kernel (because the hardware is meant to deal with the hardware, which is usually done by the kernel, and the kernel provides these functions through system calls), telling the kernel I need this file, the kernel from/start to find the Web directory, then found in the Web directory echo directory, finally found in the Echo directory index.php file, so the index.php from the hard disk read to the kernel's own memory space, and then copy the file To the memory space of the nginx process, so Nginx gets the file he wants.

Note 2: How do I find files at the filesystem level?

Like Nginx needs to get/web/echo/index.php this file

Each partition (such as ext3 ext3 file system, block block is the smallest unit of file storage default is 4096 bytes) is the metadata area and data area, each file in the metadata area has metadata entries (generally 128 bytes size), each entry has a number, we call Inode (index node), which contains the file type, permissions, number of connections, the ID of the owner and the array, the timestamp, the file occupies those disk blocks that is the block number (block, each file can occupy more than one block, and Block is not necessarily contiguous, each block is numbered), as shown in:

Another important point: The directory is also common is the file, also need to occupy disk block, directory is not a container. You see, by default, the directory created is 4096 bytes, which means that only one disk block is required, but this is indeterminate. So to find the directory also need to find the corresponding entry in the metadata area, only to find the corresponding inode can find the disk block occupied by the directory.

So what's in the directory, isn't it a file or any other directory?

In fact, the directory has such a table (so to understand), which contains the directory or file name and the corresponding inode number (temporarily referred to as the mapping table), such as:

Assume

/           1、2号block ,/其实也是一个目录 里面有3个目录  web 111web 占据 5号block 是目录 里面有2个目录 echo dataecho 占据 11号 block 是目录 里面有1个文件 index.phpindex.php 占据 15 16号 block 是文件

It is distributed in the file system as shown in

So how did the kernel find index.php this file?

The kernel gets the Nginx IO system call to get/web/echo/index.php after this file request

The 1 kernel reads the inode of the metadata area/, reads the number of the corresponding data block from the Inode, and then finds its corresponding block in the data area (1 2nd block), read the mapping table on block 1th to find the Web this name in the metadata area corresponding to the inode number 2 kernel read the web corresponding inode (3rd), to know that the web in the data area corresponding block is 5th block, Then to the data area to find block 5th, read the mapping table, the echo corresponding to the Inode is number 5th, and then to the metadata area to find the 5th inode3 Kernel read the 5th Inode, to get echo in the data area corresponds to the  Block 11th, and then to the data area to read block 11th to get the mapping table, the index.php corresponding Inode is 9th 4 kernel to the metadata area to read the 9th Inode, to get index.php corresponding is 15 and 16th data blocks, and then went to the data area to find the  16th block, read the contents of it, get the full content of index.php       
Five. The browser parses the HTML code and requests the resources in the HTML code

When the browser gets the index.html file, it begins parsing the HTML code, and when it encounters static resources such as Js/css/image, it goes to the server to request the download (using multi-threaded download, the number of threads per browser is different), this time using the Keep-alive feature , to establish an HTTP connection, you can request multiple resources, the order of downloading resources is in the order of the code, but because each resource size is different, and the browser and multi-threaded request resources, so from the point of view, the order shown here is not necessarily the order in the code.

When a browser requests a static resource (without expiring), it initiates an HTTP request to the server (asking whether the resource has been modified since the last modification time), and if the server side returns a 304 status code (which tells the browser that the server side has not been modified), Then the browser will directly read the local cache file for that resource.

Detailed browser How to work see: http://kb.cnblogs.com/page/129756/

Six. The browser renders the page rendering to the user

Finally, the browser makes use of its internal working mechanism, renders the requested static resource and HTML code, renders it to the user after rendering.

The complete HTTP transaction declaration has been completed since this time.

To turn a full HTTP request

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.