1 b/S Network architecture overview
When a user enters URL:www.google.com in the browser, the following actions occur:
1. The browser requests DNS to resolve the domain name to the corresponding IP address ;
2. Locate the corresponding server on the Internet based on the IP address, establish a socket connection, and initiate an HTTP GET request to this server ;
3. Load Balancer Device distributes the request of all users to a specific server on average;
4. Whether the requested data is stored in a distributed cache or in a static file, or in a database ;
5. When the data is returned to the browser, the browser initiates additional HTTP requests to the CDN for Static resources (such as: Css,js or images)
The above process:
2.HTTP Protocol resolution
Common HTTP request headers
Common HTTP Response Headers
Common HTTP status Codes
2.1 Browser caching mechanism
Two additional request items Pragma:no-cache and Cache-control:no-cache in the request head
1.cache-control/pragma
This HTTP head field 用于指定所有缓存机制在整个请求/响应链中必须服从的指令,如果知道该页面是否为缓存,不仅可以控制浏览器,还可以控制和HTTP协议相关的缓存或代理服务器
.
Optional values for the Cache-control/pragma field:
Cache-control the browser to comply with
Pragma the server to comply with
2.Expires Cache Expiration Time
3.Last-modified/etag Last Modified time
3 Web Workflow
For the normal online process, the system is actually doing this
The browser itself is a client, when you enter the URL, the first browser will go to request a DNS server, through DNS to obtain the corresponding domain name corresponding IP, and then through the IP address to locate the IP corresponding server, requires the establishment of a TCP connection, After the browser sends the HTTP request (request) packet , the server receives the request packet before it starts processing the request package, and the server calls its own service, returning an HTTP Response (response) packet The client receives a response from the server and starts rendering the body (body) in the response package, and then disconnects the TCP connection to the server after receiving the entire content.
How Web requests work can be easily summed up as:
Browser through DNS domain name resolution to the server IP;
the client establishes a TCP connection to the server through the TCP/IP protocol ;
The client sends the HTTP protocol request packet to the server , requesting the resource document from the server;
The server sends the HTTP protocol reply packet to the client , and if the requested resource contains content with dynamic language, the server invokes the dynamic language's interpretation engine to handle the "dynamic content" and returns the processed data to the client;
The client and server are disconnected. The client interprets the HTML document and renders the graphical results on the client screen;
4 DNS Domain name resolution 4.1 DNS domain name resolution process
DNS parsing process
1. Browser cache check (native)
2. OS cache check (native) +hosts parsing (native)
3. Local Domain Name server parsing (LDNS)
This specialized domain name resolution server performance will be very good, they generally will cache domain name resolution results, about 80% of the domain name resolution has been completed here, so Ldns mainly undertook the domain name parsing work.
4. Root name server resolution (root server)
If the Ldns does not find the corresponding entry, then the operator's DNS initiates an iterative DNS resolution request for our browser . It starts by looking for the IP address of the DNS of the root domain, locates the DNS address of the root domain, and initiates a request to it.
5. The root name server is returned to the local domain name server A queried domain's primary domain name server (gtld server) address , gTLD is an international top-level domain name server, such as. com,. cn,. org, etc., the world only about 13 units.
6. The local DNS server sends the request back to the GTLD server in one step .
So the operator's DNS is the IP address of the COM domain, but also to the IP address of the COM domain to initiate the request (what is the IP address of this domain name www.google.com?), COM domain This server tells the carrier's DNS I don't know www.google.com this domain's IP address, but I know google.com this domain's DNS address , you go to find it.
7. The requested GTLD server finds and returns the address of the name server name servers for this domain name, which is usually your registered domain name server , such as the domain name you are requesting from a domain Name service provider. Then this domain name resolution task is done by the server of the domain name provider.
So the carrier's DNS to google.com this domain name of the DNS address ( this is generally provided by the domain name registrar, such as million nets, new nets, etc. ) to initiate the request (please www.google.com this domain name of the IP address is how much?) ), This time google.com domain DNS Server A check, really in my here, so I will find the results sent to the operator's DNS server, this time the operator's DNS server to get the www.google.com this domain name corresponding IP address .
The 8.Name server domain name server queries the stored domain name and IP mapping relationship table , and normally gets the destination IP record based on the domain name, along with a TTL value returned to the DNS server domain name server.
9. Return the IP and TTL values for the domain name , and the Local DNS server caches the domain name and IP correspondence, and the cached time is controlled by the TTL value.
10. Return the parsed result to the user, cached in the local system cache based on the TTL value, and the domain name parsing process is complete .
Through the above steps, we finally get the IP address, that is, when the browser last initiated the request is based on IP and the server to do information interaction. In the actual DNS resolution process, there may be more than 10 steps, such as name server may have multiple levels, or there is a GTM to load balance control , which may affect the process of domain name resolution. According to the above analytic process, thewhole process of DNS parsing is divided into: recursive query process and iterative query process. :
Several ways to resolve domain names
A record, a for address, to specify the IP address of the domain name
If you specify item.taobao.com to 115.238.23.241, specify switch.taobao.com to 121.14.24.241. A记录可以将多个域名解析到一个IP地址,但是不能将一个域名解析到多个IP地址
.
MX record, which indicates Mail Exchange, is the ability to point mail servers under a domain name to their mail server
If the a record IP address of the taobao.com domain name is 115.238.25.245, if the MX record is set to 115.238.25.246, it is the mail DNS会将邮件发送到115.238.25.246所在的服务器,而正常通过Web请求的话仍然解析到A记录的IP地址
route of [email protected].
CNAME record, full name is canonical Name (alias parsing), so-called alias resolution is the ability to set one or more aliases for a domain name
If taobao.com resolves to xulingbo.net, srcfan.com also resolves to xulingbo.net 其中xulingbo.net分别是taobao.com和srcfan.com的别名
. The "www.taobao.com" in the previous tracking domain name resolution. 1542 in CNAME www.gslb.taobao.com "is the CNAME resolution.
NS records, specifying a DNS resolution server for a domain name, which is the DNS server that has the specified IP address to parse
Front of the "google.com." 172800 in NS ns4.google.com. " is NS parsing.
TXT record for a host name or domain name setting description
If you can set the TXT record for google.com as "Google | China" such a description.
5 initiating TCP 3-time handshake
After receiving the IP address of the domain User-Agent(一般是指浏览器)会以一个随机端口(1024 < 端口 < 65535)向服务器的WEB程序(常用的有httpd,nginx等)80端口发起TCP的连接请求。
name, this connection request (the original HTTP request passes through the layer layer of the TCP/IP4 layer model) arrives at the server side (this intermediate through various routing devices, except inside the LAN), enters the network card, then enters into the kernel the tcp/ The IP stack (used to identify the connection request, unpack the packet, layer-by-layer stripping), and possibly pass through the NetFilter firewall (which belongs to the kernel module), eventually arrives at the Web program and eventually establishes a TCP/IP connection.
The client first sends a connection heuristic, ack=0 indicates that the confirmation number is invalid, and SYN = 1 indicates that this is a connection request or a connection acceptance message, while indicating that the datagram cannot carry data, and seq = x represents the client's own initial sequence number (seq = 0 means this is the No. 0 packet), At this time the client enters the Syn_sent state, indicating that the clients wait for the server reply.
After the server has heard the connection request message, if it agrees to establish a connection, it sends a confirmation to the client. The SYN and ACK in the TCP header is set to 1, and ack = x + 1 indicates that the first data byte ordinal that expects to receive the next segment of the message is x+1, indicating that all data up to X is received correctly (Ack=1 is actually ack=0+1, which is the 1th packet of the expected client), seq = Y represents the server's own initial sequence number (seq=0 is the No. 0 packet issued by the server side). The server then enters SYN_RCVD, indicating that the server has received a connection request from the client and waits for client confirmation.
After the client receives the acknowledgement, it also needs to send the confirmation again, carrying the data to be sent to the server. Ack 1 indicates that the confirmation number ack= y + 1 is valid (represents the 1th packet expected to receive the server), the client's own sequence number seq= X + 1 (indicating that this is my 1th package, relative to the No. 0 packet), once received the client's confirmation, This TCP connection enters the established state and the HTTP request can be initiated.
Why TCP needs 3 handshake
An error occurred in order to prevent a failed connection request message segment from suddenly being transmitted to the server
Why does the HTTP protocol be implemented on TCP?
Currently all transmissions in the Internet are made HTTP协议作为TCP/IP模型中应用层的协议也不例外,TCP是一个端到端的可靠的面向连接的协议
through TCP/IP, so HTTP is based on the Transport Layer TCP protocol without worrying about the various problems of data transmission.
6 initiating an HTTP request after establishing a TCP connection
经过TCP3次握手之后,浏览器发起了http的请求(看第?包)
, using the method of HTTP GET method, the URL of the request is/, the protocol is http/1.0:
03175340_4j8z.png
The following is the detailed contents of package number 12th:
03175429_khop.png
The above message is an HTTP request message. So what is the format of the HTTP request message and the response message?
Starting line: such as get/http/1.0 (the protocol used by the URL request for the requested method request)
Header information: User-agent host and other paired-up values
Subject
Both the request message and the response message will follow the above format. So what are the request methods in the start line?
GET: Complete Request for a resource (common)
HEAD: Request response header only
POST: Submit form (common)
PUT: Upload
Delete: Remove
Options: Methods to return the methods supported by the requested resource
TRACE: The agent that pursues the intermediate of a resource request
What is a URL, URI, URN?
URI Uniform Resource Identifier Uniform Resource Identifier, such as: Scheme://[username:[email Protected]]host:port/path/to/source
URL Uniform Resource Locator Uniform Resource Locator, such as: http://www.magedu.com/downloads/nginx-1.5.tar.gz
URN Uniform Resource name Uniform Resource Names
URL和URN都属于URI
, for the convenience of the URL and URI to refer to a thing temporarily.
What kinds of protocols are requested? There are several types of the following:
Http/0.9:stateless
Http/1.0:mime, keep-alive (keep connected), cache
http/1.1: More request method, finer cache control, persistent connection (persistent connection) more commonly used
The following is the header information for the HTTP request message that was initiated by Chrome:
03181252_cie1.png
Accept is to tell the server side, accepting those MIME types
Accept-encoding, this looks like a file that's accepting those compression patterns.
Accept-lanague tell the server which languages to send
Connection tells the server to support the Keep-alive feature, and the TCP connection will remain open after sending, 浏览器可以继续通过相同的TCP连接发送请求
so. Maintaining a connection saves the time it takes to establish a new connection for each request and also saves network bandwidth.
Cookies carry cookies on each request to facilitate server-side identification of the same client
Host is used to identify the virtual host on the request server, such as nginx can define a number of virtual hosts, which is used to identify the virtual host to access.
User-agent User Agent, the general situation is the browser, there are other types, such as: wget curl search engine spider, etc.
Conditional Request Header: If-modified-since is the browser asking the server for a resource file if it has been modified since when, then re-send it to me, so that when the server-side resource files are updated, the browser requests again, rather than using the files in the cache.
Security Request Header: Authorization: The authentication information provided by the client to the server;
What is MIME?
MIME (Multipurpose Internet Mail extesions Multipurpose Internet Message extension) is an Internet standard that expands the e-mail standard to support mail messages in a variety of formats, such as non-ASCII characters, binary format attachments, and so on. This standard is defined in RFC 2045, RFC 2046, RFC 2047, RFC 2048, RFC 2049, and so on. RFC 2822, which is a transition from RFC 822, stipulates that e-mail standards do not allow the use of characters other than the 7-bit ASCII character set in mail messages. Because of this, some non-English character messages and binary files, images, sounds and other non-text messages cannot be transmitted in e-mail.
MIME规定了用于表示各种各样的数据类型的符号化方法。
In addition, a MIME framework is used in the HTTP protocol used in the World Wide Web, and the standard is extended to the Internet media type.
MIME follows the following format: major/minor 主类型/次类型
for example:
Image/jpg
Image/gif
Text/html
Video/quicktime
appliation/x-httpd-php
Web request Process