Original address: http://www.cnblogs.com/engeng/articles/5959335.html
Recently, the interview is often asked this question, want to interview the interviewer wants to examine the network knowledge, see whether this knowledge has formed the architecture.
When we enter www.linux178.com in the address bar of the browser, and then go to enter, enter this moment to see what happened to the page?
The following procedure is for personal understanding only:
Domain Name resolution -- initiates a TCP 3 handshake -- initiates an HTTP request after establishing a TCP connection -- the server responds to the HTTP request, and the browser gets the HTML code -- > Browser parses HTML code and requests resources in HTML code (such as JS, CSS, pictures, and so on) to render the page to the user
For the HTTP protocol, refer to the following:
Talk about HTTP protocol http://kb.cnblogs.com/page/140611/
HTTP protocol Overview http://www.cnblogs.com/vamei/archive/2013/05/11/3069788.html
Learn about the various aspects of HTTP headers http://kb.cnblogs.com/page/55442/
Here's an analysis of the process above, and we'll take the Chrome browser as an example:
1. Domain Name resolution
First, the Chrome browser will resolve the IP address of the www.linux178.com domain name (the exact name should be the hostname). how to resolve to the corresponding IP address ?
①chrome Browser will first search the browser's own DNS cache (cache time is relatively short, about 1 minutes, and can only hold 1000 cache), to see if their own cache has www.linux178.com corresponding entries, and has not expired, Resolves to this end if there is no expiration.
Note: How do we view Chrome's own cache? You can use the chrome://net-internals/#dns to view
② if the browser itself does not find the corresponding entry in the cache, then Chrome will search the operating system's own DNS cache , if found and not expired, stop the search parsing to this end.
Note: How to view the DNS cache of the operating system itself, take Windows system as an example, can be viewed at the command line using Ipconfig/displaydns
③ if the DNS cache on the Windows system is not found, then try to read the Hosts file (located in C:\Windows\System32\drivers\etc)to see if there is no IP address for that domain name, if any, the resolution succeeds.
④ if the corresponding entry is not found in the Hosts file, the browser initiates a DNS system call to the locally configured preferred DNS server (typically provided by the telco operator or a DNS server like Google) Initiate the Domain name resolution request (through the UDP protocol to the DNS port 53 to initiate the request, this request is a recursive request, that is, the operator's DNS server must provide us with the IP address of the domain name), the operator's DNS server first find its own cache, find the corresponding entry, and not expired, The parse succeeds. If the corresponding entry is not found, then there is a carrier's DNS for our browser to initiate an iterative DNS resolution request, it is to find the root domain of the DNS IP address (this DNS server is built in 13 root domain DNS IP address), find the root domain of the DNS address, will make a request to it (ask www.linux178.com the IP address of this domain name AH?) ), root domain found this is a domain name of a top-level domain COM domain, so tell the carrier's DNS I do not know the IP address of this domain name, but I know the IP address of the COM domain, you go to find it, so the operator's DNS to get the IP address of the COM domain, Another request to the IP address of the COM domain (what is the IP address of this domain name www.linux178.com?), COM domain This server tells the operator of the DNS I do not know www.linux178.com the IP address of this domain name, but I know linux178.com this domain DNS address, you go to find it, so the operator's DNS and to linux178.com the DNS address of this domain name (this is generally By a domain name registrar, such as WAN Network, new network, etc.) to initiate the request (please www.linux178.com the IP address of this domain name is how much?) ), This time linux178.com domain DNS Server A check, eh, really in my place, so the results of the found sent to the operator's DNS server, this time the operator's DNS server got www.linux178.com the domain name corresponding IP address, and returned to the Windows system kernel, The kernel also returns the result to the browser, finally the browser to get the www.linux178.com corresponding IP address, the action of one step.
Note: In general, the following steps are not performed
If the above 4 steps are not resolved successfully, then the following steps are performed (for the Windows operating system):
⑤ the operating system will look for NetBIOS name cache (NetBIOS names cached, there is a client computer), what is this cache? The computer name and IP address of the computer that I have successfully communicated with in the recent period of time will exist in this cache. Under what circumstances can the step be resolved successfully? This is the name just a few minutes ago and I successfully communicated, then this step can be successfully resolved.
⑥ If step ⑤ is not successful, it will query the WINS server (the server that corresponds to the NetBIOS name and IP address)
⑦ If step ⑥ is not successfully queried, then the client is going to broadcast the search
⑧ If step ⑦ is unsuccessful, the client reads the LMHOSTS file (as well as the same directory as the Hosts file)
If the eighth step has not been resolved successfully, then declared this resolution failed, it will not be able to communicate with the target computer. As long as there is one step in these eight steps to resolve the success, you can successfully communicate with the target computer.
See Grab Bag:
Linux virtual machine test, using the command wget www.linux178.com to request, found the direct use of Chrome browser request, the interference request is more, so use the wget command to request, However, using the wget command only returns the index.html request, and does not request the static resources (JS, CSS, etc.) contained in the index.html.
Packet Capture Analysis:
①, this is the virtual machine on the radio, To get the MAC address of the 192.168.100.254 (that is, the gateway), because the LAN communication depends on the MAC address, why it needs to communicate with the gateway is because our DNS server IP is the perimeter IP, to go out must rely on the gateway to help us out of the line.
② packet, this is the gateway received the virtual machine after the broadcast, the response to the virtual machine response, tell the virtual machine their own MAC address, so the client found the route exit.
③, this package is the wget command to the system configuration of the DNS server to make a domain name resolution request (exactly should be wget initiated a DNS resolution of the system call), the requested domain name www.linux178.com, Expect to get the address of IP6 (AAAA is the IPV6 address)
④, this DNS server to the system's response, it is clear that the current use of IPv6 or very few, so the AAAA record is not
⑤, this is the request to resolve the IPV6 address, but www.linux178.com.leo.com this hostname is not exist, so the result is no such name
⑥, this is the IPV4 address of the requested domain name (a record)
⑦, the DNS server, whether it is from the cache inside, or iterative query finally got the IP address of the domain name, response to the system, the system gave the wget command, Wget then got the www.linux178.com IP address, it can also be seen here that the client and the local DNS server is a recursive query (that is, the server must give the client a result) This can start the next step, the TCP three handshake.
2.3-time handshake to initiate TCP
After receiving the IP address of the domain name, user-agent (typically the browser) initiates a TCP connection request to the server's Web program (usually Httpd,nginx, etc.) with a random port (< port < 65535). This connection request (the original HTTP request passes through the layer layer of the TCP/IP4 layer model) arrives at the server side (this intermediate through various routing devices, except inside the LAN), enters to the network card, then enters into the kernel TCP/IP protocol stack (used to identify the connection request, unpack the packet, a layer of peel off), It is also possible to pass the filtering of the NetFilter firewall (which is the kernel module) and finally arrive at the Web program (Nginx for example) and finally establish a TCP/IP connection.
Such as:
1) The client first sends a connection heuristic, ack=0 indicates that the confirmation number is invalid, and SYN = 1 indicates that this is a connection request or a connection acceptance message, while indicating that the datagram cannot carry data, and seq = x represents the client's own initial sequence number (seq = 0 Represents this is the No. 0 packet), then The client waits for a response from the server to enter the Syn_sent state
2) When the server supervisor hears the connection request message, if it agrees to establish the connection, it sends the acknowledgement to the client. The SYN and ACK in the TCP header is set to 1, and ack = x + 1 indicates that the first data byte ordinal that expects to receive the next segment of the message is x+1, indicating that all data up to X is received correctly (Ack=1 is actually ack=0+1, which is the 1th packet of the expected client), seq = Y represents the server's own initial sequence number (seq=0 is the No. 0 packet issued by the server side). The server then enters SYN_RCVD, indicating that the server has received a connection request from the client and waits for client confirmation.
3) When the client receives confirmation, it also needs to send the confirmation again, carrying the data to be sent to the server. Ack 1 indicates that the confirmation number ack= y + 1 is valid (represents the 1th packet expected to receive the server), the client's own sequence number seq= X + 1 (indicating that this is my 1th package, relative to the No. 0 packet), once received the client's confirmation, This TCP connection enters the established state and the HTTP request can be initiated.
See Grab Bag:
⑨, this is the one that corresponds to step 1 above)
⑩ Package This corresponds to the above step 2)
This corresponds to the above step 3)
Why does TCP need to shake 3 times?
As an example:
Suppose a foreigner lost in the Forbidden City, see Xiao Ming, so there is the following dialogue:
Foreigner: Excuse Me,can you Speak 中文版?
Xiaoming: Yes.
Foreigner: Ok,i Want ...
Before asking the way, the foreigner asked Xiao Ming whether he would speak English, and Xiao Ming answered yes, then the foreigner began to ask the way
2 computer communication is by the Protocol (currently popular TCP/IP protocol) to achieve, if 2 computers use a different protocol, it is not able to communicate, so this 3-time handshake is equivalent to testing whether the other party follows the TCP/IP protocol, the negotiation is completed after the communication can be done, Of course this understanding is not so accurate.
Why does the HTTP protocol be implemented on TCP?
Currently all the traffic in the Internet through TCP/IP, HTTP protocol as the TCP/IP model Application layer protocol is no exception, TCP is an end-to-end reliable connection-oriented protocol, so HTTP based on the Transport Layer TCP protocol without worrying about the various problems of data transmission.
3. Initiating an HTTP request after establishing a TCP connection
After TCP3 the handshake, the browser initiates an HTTP request (see packet), uses the method GET method of HTTP, the URL of the request is/, the protocol is http/1.0
The following is the detailed contents of package number 12th:
The above message is an HTTP request message.
So what is the format of the HTTP request message and the response message?
Starting line: such as get/http/1.0 (the protocol used by the URL request for the requested method request)
Header information: User-agent host and other paired-up values
Subject
Both the request message and the response message will follow the above format.
So what are the request methods in the start line?
GET: Complete Request for a resource (common)
HEAD: Request response header only
POST: Submit form (common)
PUT: (WebDAV) uploads a file (but the browser does not support this method)
Delete: (WebDAV) removal
Options: Methods to return the methods supported by the requested resource
TRACE: The agent that pursues the intermediate of a resource request (the method cannot be emitted by the browser)
What is a URL, URI, URN?
URI Uniform Resource Identifier Uniform Resource Identifier
URL Uniform Resource Locator Uniform Resource Locator
The format is as follows: Scheme://[username:[email Protected]]host:port/path/to/source
Http://www.magedu.com/downloads/nginx-1.5.tar.gz
URN Uniform Resource name Uniform Resource Names
URLs and urns belong to URIs
For the convenience of the URL and URI to refer to a thing temporarily
What kinds of protocols are requested?
There are several types of the following:
Http/0.9:stateless
Http/1.0:mime, keep-alive (keep connected), cache
http/1.1: More request method, finer cache control, persistent connection (persistent connection) more commonly used
Here is the header of the HTTP request message from Chrome
which
Accept is to tell the server side, I take those MIME types
Accept-encoding, this looks like a file that's accepting those compression patterns.
Accept-lanague tell the server which languages to send
Connection tells the server to support Keep-alive features
Cookies carry cookies on each request to facilitate server-side identification of the same client
Host is used to identify the virtual host on the request server, such as nginx can define a number of virtual hosts
It is used to identify the virtual host to access.
User-agent User Agent, the general situation is the browser, there are other types, such as: wget curl search engine spider, etc.
Conditional Request Header:
If-modified-since is the browser asking the server for a resource file if it has been modified since when, then send it back to me, so that the server-side resources
When the file is updated, the browser goes to the request again instead of using the cached file
Security Request Header:
Authorization: Authentication information provided to the server by the client;
What is MIME?
MIME (Multipurpose Internet Mail extesions Multipurpose Internet Message extension) is an Internet standard that expands the e-mail standard to support mail messages in a variety of formats, such as non-ASCII characters, binary format attachments, and so on. This standard is defined in RFC 2045, RFC 2046, RFC 2047, RFC 2048, RFC 2049, and so on. RFC 2822, which is a transition from RFC 822, stipulates that e-mail standards do not allow the use of characters other than the 7-bit ASCII character set in mail messages. Because of this, some non-English character messages and binary files, images, sounds and other non-text messages cannot be transmitted in e-mail. MIME defines a symbolic method for representing a wide variety of data types. In addition, a MIME framework is used in the HTTP protocol used in the World Wide Web, and the standard is extended to the Internet media type.
MIME follows the following format: Major/minor main type/Sub-type for example:
12345 |
image /jpg image /gif text /html video /quicktime appliation /x-httpd-php |
4. Server-side response HTTP request, the browser gets HTML code
Look, package 12th is the HTTP request packet, and the 32nd packet is the HTTP response packet.
After the server-side Web program receives the HTTP request, it begins processing the request and returns it to the browser HTML file after processing.
Package 32nd is the server's return to the client HTTP response package (the MIME type of the $ OK response is text/html), which represents the successful response of the client-initiated HTTP request. 200 represents the status code of the response success, and there are other status codes as follows:
1XX: Informational Status code
100, 101
2XX: Success Status Code
200:ok
3xx: Redirect Status code
301: Permanent redirect, the value of the location response header is still the current URL, so it is a hidden redirect;
302: Temporary Redirect, explicit redirect, location response header value for new URL
304:not Modified not modified, such as the local cache resource file and the comparison on the server, the discovery has not been modified, the server returned a 304 status code,
Tell the browser that you do not have to request the resource and use the local resources directly.
4XX: Client Error status code
404:not Found The requested URL resource does not exist
5XX: Server-side Error status code
500:internal Server error Server internal errors
502:bad Gateway in front of the proxy server does not contact the backend server when it appears
504:gateway Timeout This is the agent can contact the backend server, but the backend server did not respond to the proxy server within the specified time
Response header information that you see in Chrome browser:
Connection using the Keep-alive feature
Content-encoding using gzip to compress resources
Content-type MIME type is HTML type, character set is UTF-8
Date of the datetime response
Web server used by server
transfer-encoding:chunked chunked transfer encoding is a data transmission mechanism in HTTP that allows HTTP to be sent from a Web server to a client application (typically a Web browser) that can be divided into multiple parts. chunked transfer encoding is only available in the HTTP Protocol 1.1 version (http/1.1)
Vary This can be consulted (http://blog.csdn.NET/tenfyguo/article/details/5939000)
X-pingback Reference (http://blog.sina.com.cn/s/blog_bb80041c0101fmfz.html)
So what happens when the server receives an HTTP request and generates an HTML file?
Suppose the server side uses the nginx+php (FASTCGI) architecture to provide services
①nginx reading a configuration file
We enter in the address bar of the browser is http://www.linux178.com (HTTP//Can not input, the browser will automatically help us to add), in fact, the complete should be http://www.linux178.com./ There is a point in the back (this point is the root domain, usually we do not input, nor display), the latter/also do not add, the browser will automatically help us to add (and see the 3rd image inside the URL), then the actual request URL is http://www.linux178.com/, Okay, then. Nginx receives the browser get/request, will read the HTTP request inside the header information, according to host to match all of its own virtual host configuration file server_name, to see if there is no match, then read the configuration of the virtual host, found the following configuration:
Through this we know all the Web files in this directory is the directory is/when we http://www.linux178.com/access to the directory under the file, such as access to http://www.linux178.com/index.html, Then there's a file under/web/echo, called Index.html.
1 |
index index.html index.htm index.php |
Through this will be able to know the website home file is that file, that is, we are in http://www.linux178.com/, Nginx will automatically help us to index.html (assuming the home is index.php of course will try to find the file, If the file is not found, then look down, if the 3 files are not found, then throw a 404 error) added to the back, then add the URL is/index.php, and then according to the following configuration for processing
1234567 |
location ~. *\.php (\/.*) *$ { &NBSP;&NBSP;&NBSP; Code class= "Bash Plain" >root /web/echo &NBSP;&NBSP;&NBSP; fastcgi_pass 127.0.0.1:9000; &NBSP;&NBSP;&NBSP; fastcgi_index index.php; &NBSP;&NBSP;&NBSP; astcgi_param script_filename $document _root$fastcgi_script_name; &NBSP;&NBSP;&NBSP; include fastcgi_params; |
This configuration indicates that any matching in the requested URL (where regular expression is enabled to match) *.php suffix (followed by arguments) is given to the backend's fastcgi process for processing.
② the php file to the fastcgi process.
So Nginx put/ index.php this URL to the back end of the fastcgi process processing, waiting for fastcgi processing completed (combined with database query data, populate template generated HTML file) returned to nginx a index.html document, Nginx then return this index.html to the browser, in is the browser to get the first page of the HTML code, while Nginx write an access log to the log file.
Note 1:nginx How to find index.php file?
When Nginx discovers that a/web/echo/index.php file is required, it initiates an IO system call to the kernel (because the hardware is meant to deal with the hardware, which is usually done by the kernel, and the kernel provides these functions through system calls), telling the kernel I need this file, the kernel from/start to find the Web directory, and then find the Echo directory in the Web directory, Finally found in the Echo directory index.php file, so the index.php from the hard disk read to the kernel's own memory space, and then copy the file to the Nginx process in the memory space, so Nginx got the file they want.
Note 2: How do I find files at the filesystem level?
Like Nginx needs to get/web/echo/index.php this file
Each partition (such as ext3 ext3 file system, block block is the smallest unit of file storage default is 4096 bytes) is the metadata area and data area, each file in the metadata area has a metadata entry (typically 128 byte size), each entry has a number, We call it the Inode (index node), which contains the file type, the permissions, the number of connections, the ID of the owner and the array, the timestamp, the file that occupies those disk blocks, which is the block number (block, each file can occupy more than one block, and the block is not necessarily contiguous, each block is numbered), as shown in:
Another important point: The directory is also common is the file, also need to occupy disk block, directory is not a container. You see, by default, the directory created is 4096 bytes, which means that only one disk block is required, but this is indeterminate. So to find the directory also need to find the corresponding entry in the metadata area, only to find the corresponding inode can find the disk block occupied by the directory.
So what's in the directory, isn't it a file or any other directory?
In fact, the directory has such a table (so to understand), which contains the directory or file name and the corresponding inode number (temporarily referred to as the mapping table), such as:
Assume
/In the data area occupies 1, 2nd block,/In fact, there is a directory of 3 directories in the Web 111
The Web occupies block 5th, and there are 2 directories in the directory. Echo data
Echo occupies block 11th, which contains 1 files in the catalogue index.php
Index.php occupies 15, 16th block is a file.
It is distributed in the file system as shown in
So how did the kernel find index.php this file?
The kernel gets the Nginx IO system call to get/web/echo/index.php after this file request
The ① kernel reads the metadata area/inode, reads/blocks the data from the Inode, and then finds its corresponding block (1 2nd block) in the data area, and reads the mapping table on block 1th to find the inode number of the web name in the metadata area.
The ② kernel reads the web corresponding inode (number 3rd), from which the web in the data area corresponding block is block 5th, and then to the data area to find the 5th block, from which to read the mapping table, the echo corresponding to the Inode is 5th, and then to the metadata area to find the number 5th Inode
③ kernel read 5th inode, get echo in the data area corresponding to the 11th block, so to the data area to read 11th block to get the mapping table, get index.php corresponding Inode is 9th number
④ kernel to the metadata area to read the number 9th inode, the index.php corresponding to the data block 15 and 16th, and then to the data area to find 15 16th block, read the contents of it, to get the full content of index.php
5. The browser parses the HTML code and requests the resources in the HTML code
When the browser gets the index.html file, it begins parsing the HTML code, and when it encounters static resources such as Js/css/image, it goes to the server to request the download (using multi-threaded download, the number of threads per browser is different), this time using the Keep-alive feature , to establish an HTTP connection, you can request multiple resources, the order of downloading resources is in the order of the code, but because each resource size is different, and the browser and multi-threaded request resources, so from the point of view, the order shown here is not necessarily the order in the code.
When a browser requests a static resource (without expiring), it initiates an HTTP request to the server (asking whether the resource has been modified since the last modification time), and if the server side returns a 304 status code (which tells the browser that the server side has not been modified), Then the browser will directly read the local cache file for that resource.
Detailed browser How to work see: http://kb.cnblogs.com/page/129756/
6. The browser renders the page rendering to the user
Finally, the browser makes use of its internal working mechanism, renders the requested static resource and HTML code, renders it to the user after rendering.
The complete HTTP transaction declaration has been completed since this time.
This article is from "Renacos blog" blog, reproduced from http://linux5588.blog.51cto.com/65280/1351007
What is the process of a complete HTTP transaction?