Turn from:
Http://blog.chinaunix.net/uid-9112803-id-3212207.html
Summary:
This article analyzes the browser input URL to the entire page display of the entire process, to Baidu home, for example, combined with Wireshark capture group for detailed analysis of the entire process, so as to better understand the TCP/IP protocol stack.
first, capture group
1.1 Preparatory work
(1) Clear browser cache
Start by emptying the Web browser cache, making sure that the Web page is fetched from the network, not from the cache [1]. Google Chrome, Options---under the Hood---Clear browsing data.
(2) emptying the DNS cache
When the client empties the DNS cache, ensure that the Web server domain name to IP address mapping is requested from the network. In Windows XP, enter ipconfig/flushdns on the command line to complete the operation.
Figure 1 emptying the DNS cache
(3) Set filter rules
For ease of analysis, set the filtering rules before intercepting the packets. In filter ToolBar, enter the transition rule regular expression, where the ARP protocol (Address Resolution Protocol) is filtered, as follows:
Figure 2 Setting filter rules
(4) Close the network application
To make the captured message relevant only to the access URL, close other network applications (such as QQ).
1.2 start WIRESHRK packet catcher
capture--Interfaces, pop up the following window, set the interface, click Start to start the packet catcher.
Figure 3 Starting the Wireshark packet catcher
1.3 Browser Input URL
Here, take Baidu as an example, in the browser input: http://www.baidu.com, enter.
1.4 Stop Group capture
Figure 4 Example of Wireshark capture
Ii. Overview
The application layer protocol of Web is a Hypertext Transfer Protocol Http,http protocol is implemented by two parts: client program, server program, protocol defines the format of these messages and how the client and server can exchange messages. Web servers are used to store Web objects, each of which is addressed by URLs, and Web clients typically refer to browsers. The browser sends an HTTP request message to the server for the objects contained in the Web page, and the server accepts the request and responds with the HTTP response message containing the objects. A Web page is made up of objects, which are simply files (form files, Java applets, sound clip files) that are addressed through a URL address. A Web page typically contains a basic HTML file and several reference objects.
From the browser input http://www.baidu.com/and enter, to the browser window to display the Baidu home page, which experienced a number of processes, but also involves a lot of protocols (4), followed by the combination of captured packet analysis of what happened during this period.
2.1 Domain name resolution (17~18)
You first need to resolve the URL (the server host name of the object and the path name of the object) to the IP address, as follows:
(1) client side (e.g. browser) running DNS application on the same user host
(2) Extract the hostname www.baidu.com from the above URL and pass it to the client side (browser) of the DNS application
(3) The DNS client sends a request with a host name to the DNS server (DNS query packet )
(4) The DNS client receives an answer message (that is, the DNS reply message ) that contains the IP address of the host name 119.75.218.70
Steps (3), (4) corresponding to the first two DNS messages 17, 18, to further obtain the details of these two steps, you have to understand the DNS protocol, details see:
Blog "DNS protocol for deep understanding of TCP/IP protocol stacks with Wireshark capture Packet"
2.2 Establishing a TCP link (19~24)
HTTP uses TCP as the underlying transport protocol, which requires that a connection be established, that is, the browser sends a TCP link by an HTTP server that is located by an IP address. TCP connection establishment and TCP packet analysis, see:
The TCP protocol (TCP message format + three handshake instance) with Wireshark capture packet in-depth understanding of TCP/IP protocol stack
2.3 Extracting content
2.3.1 Overview
The client sends an HTTP request message to the server, the server returns an HTTP response message, and for the HTTP message format and instance, see:
"HTTP protocol for deep understanding of TCP/IP protocol stacks with Wireshark capture Packet"
The server returns the page content requested by the client, to the browser, and the browser interprets the HTML file (the browser is essentially an HTML interpreter) and displays the text. The entity body part returned by the server is as follows (view the source code through the browser or view the entity body of the response message):
Figure 500 Home HTML language
2.3.2 Capture Packet Analysis (26~32)
As can be seen from Figure 4, the HTTP request message to the HTTP response message, there are several TCP message segments, as follows:
Figure 6 Request-to-response-related message segment
Look very messy, first give me the analysis, and then analysis, as follows:
Figure 7 Request-to-response-related message segment
First, the client sends an HTTP request message 25, the server responds to the message 26 (because the HTTP Transport layer protocol is TCP, reliable transmission). Next, the server returns an HTTP response message, because the HTTP message is too large (3835 bytes), the network layer shards it, a total of 4 pieces, such as (truncated from the Wireshark capture HTTP response message):
Figure 8 Data Shard Instance
So why is there a single acknowledgment for every two TCP segments, because TCP has a cumulative acknowledgement to improve efficiency, that is, after receiving multiple segments of the message, once again confirmed.
2.3 Browser display text content
At this point, the browser receives Baidu homepage Basic HTML page, the browser interprets the HTML page (is not also called the JS interpreter explanation JavaScript script?). ), the results are as follows:
Figure 9 Browser explanation Baidu homepage Basic HTML page
Obviously, this is different from the home page, with the X box, Baidu button, this is because these objects have not been obtained.
2.4 The browser retrieves and displays all objects in the file
The browser (the client's proxy) continues to request the required content from the appropriate server, and from the Wireshark capture group, the browser requests the image, JavaScript object, as follows:
Figure 10 Browser requests other objects in the file
One thing I don't understand, why request/favicon.ico object? The HTML file cannot find the relevant code, and the page does not display the icon. I beg your advice, thank you:-)
All the objects in this web page are all aligned, the browser interprets all the objects and displays them, and finally, if:
Figure 1100 Home
There are 3 ways the browser interprets objects: A built-in interpreter (such as an HTML interpreter), a plug-in, and a helper application. By generating a full web page from various scripts on the server, the server returns a page that also returns some additional information about the page (including the meme type), which is interpreted directly by the built-in interpreter for the built-in type object, and the other, the browser references the MIME type table, and invokes the appropriate viewer to process the object.
Iii. Other issues
3.1 Other DNS messages
Wireshark captured packets contain many DNS messages, in fact, only the first two sets of DNS messages are needed, others are prefetching. As can be seen, the browser will be Baidu homepage html involved in the URL (make full use of idle time, reduce waiting time)
Figure DNS Message Instance
So many groups, how to pair it quickly? Fast pairing with the number of identifiers (as shown), or opening a line under Domain Name system, request in or response in will tell which message number matches.
3.2 Other TCP message segments
Other TCP segments, either TCP connections or data transfers, are easy to analyze according to the TCP Protocol (TCP message format + three handshake instance), which is deeply understood in conjunction with the Wireshark capture packet for the TCP/IP protocol stack.
3.3 SSDP Protocol
As you can see from Figure 4, there are a lot of SSDP messages before and after the access page. The SSDP (Simple Service Discovery Protocol, simplified Services Discovery Protocol) defines how network services are discovered on the network. Whether it is a control point, or UPnP device, the work is bound to use the SSDP, after the device access network, to use it to broadcast their own presence (broadcast information is also a description of the location of the device), in order to contact the corresponding control points as soon as possible The control point uses SSDP to find out where the device is going to be controlled. And can eliminate existing equipment and control points-only for the new or not yet "contact" on both sides of the service [1].
3.4 NBNS
Wireshark also captures a NBNS message. The network basic input/output system (NetBIOS) name Server (NBNS) protocol is part of the NetBIOS (NetBT) protocol family on TCP/IP, which provides a host name and address mapping method on a network that is based on NetBIOS name access.
Go _ combine Wireshark capture packet to understand TCP/IP protocol stack in depth