HTTP request Response Whole process

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

HTTP Stateless

The HTTP protocol is stateless (stateless). That is, when the same client accesses a page on the same server for the second time, the server cannot know that the client has visited and the server cannot distinguish between different clients. The stateless nature of HTTP simplifies the design of servers, making it easier for servers to support a large number of concurrent HTTP requests.

HTTP Persistent Connection

HTTP1.0 uses a non-persistent connection, the main drawback is that the client must establish and maintain a new connection for each object to be requested, that is, twice times the cost of the RTT per request for a document. Because there may be multiple objects on the same page, non-persistent connections can make a page download very slow, and this short connection increases the burden of network transmission. HTTP1.1 uses persistent connection keepalive, the so-called persistent connection, that is, the server after sending the response still for a period of time to maintain the connection, allowing multiple data requests and responses in the same connection, that is, in the case of persistent connection, the server does not close the TCP connection after sending the response, The client can continue to request other objects through this connection.

There are two ways to http/1.1 a persistent connection to a protocol:

Non-pipelined mode: The customer receives the previous response before issuing the next request;

Pipelining: The customer can then send a new request message before receiving the HTTP response message;

1. First, enter the URL in the browser:

2, the browser according to the domain name resolution IP address:

The browser finds its IP address based on the domain name visited. The DNS lookup process is as follows:
1) Browser cache: The browser caches DNS records for a period of time. However, the operating system does not tell the browser when to store DNS records, so that different browsers will store a self-fixed time (2 minutes to 30 minutes).
2) System cache: If the required domain name is not found in the browser cache, the browser makes a system call (gethostbyname in Windows) so that the records in the system cache can be obtained.
3) Router cache: If the system cache does not find the required domain name, the router will be sent a query request, it will generally have its own DNS cache.
4) ISP DNS cache: If the required domain name is still not found, the last thing to check is the ISP cache DNS server. The corresponding cache record is usually found here.

Domain Name Resolution principle:

1> the mapping of each host name and its IP address in a domain is managed by the DNS server for the domain, for example, "www.it.org", "ftp.it.org", "blog.it.org", and so on, by the admin domain "it.org" DNS server and cannot be managed by a DNS server that manages the domain "org".

2> each admin domain must register the name of the subdomain and the IP address of the DNS server for that subdomain on its immediate parent domain's DNS server, for example, the domain "org" must be registered in the DNS server to register the subdomain "it.org" and the IP address of its DNS server after the name "it.org" Can truly be recognized by the outside world.

3> in order to facilitate the unified management of the top-level domain name, in fact there is a root domain name, root domain name with a point (.), for example, "www.it.org" can also be written as "www.it.org.", "www.it.org." The last point (.) in the list represents the root domain name. The root domain name in the Internet is centrally managed by InterNIC (Internet Information Center), and the top-level domain name and the domain name under it are managed by the Organization, company, and individual who owns the domain name.

There are two main ways to resolve a domain name, namely:

DNS has a disadvantage that a domain name only seems to correspond to a single IP address. Fortunately there are several ways to eliminate this bottleneck:

1> Loop DNS is a solution when DNS lookups return multiple IPs. For example, facebook.com actually corresponds to four IP addresses.

A 2> load balancer is a hardware device that listens on a specific IP address and forwards network requests to a clustered server. Some large sites typically use this expensive, high-performance load balancer.

3> geographic DNS improves scalability by mapping domain names to multiple different IP addresses, depending on the geographic location of the user. Such a different server is not able to update the synchronization state, but it is good to map the static content.

4>anycast is a routing technology that maps multiple physical hosts to an IP address. The drawback is that anycast and TCP protocols are not well adapted, so they are rarely used in those scenarios. Most DNS servers use anycast to obtain efficient, low-latency DNS lookups.

3. The browser establishes a TCP connection with the Web server

4. The browser sends an HTTP request to the Web server:

An HTTP request message consists of a request line <request-line>, a request header
and the request data 4 parts, the general format of the request message such as:

1) Request line: consists of the request method, URL, and HTTP protocol version 3 fields, which are separated by a space. For example, get/index.html http/1.1. The HTTP protocol request method has get, POST, HEAD, PUT, DELETE, OPTIONS, TRACE, CONNECT. And there are several common ones:

1>get: When a client wants to read a document from a server, it uses the GET method when it clicks a link on a webpage or browses a Web page by entering a URL in the browser's address bar. The Get method requires the server to place the URL-positioned resource in the data portion of the response message, which is sent back to the client. When using the Get method, the request parameter and the corresponding value are appended to the URL, using a question mark ("?" ) represents the end of the URL and the start of the request parameter, which is limited by the length of the pass parameter. For example,/index.jsp?id=100&op=bind. Data passed by get is placed directly in the address, so requests for get methods generally do not include the "Request Content" section, where the request data is represented as an address in the request line. Address "?" The next part is the request data sent through GET, we can see clearly in the address bar, each data is separated by the "&" symbol. This is obviously not a good way to transfer private data. Also, because different browser-to-address character restrictions are also different, generally only up to 1024 characters can be recognized, so if you need to transfer large amounts of data, it is not appropriate to use the Get method. If the data is an English letter/number, sent as is, if it is a space, converted to +, if it is Chinese/other characters, the string is directly encrypted with BASE64, such as:%E4%BD%A0%E5%A5%BD, where the xx in%xx is the symbol in 16 binary notation ASCII.

2>post: Allows clients to provide more information to the server. The Post method encapsulates the request parameter in the HTTP request data, appears as a name/value, and can transmit a large amount of data so that the post does not have a limit on the size of the data being transmitted, and it is not displayed in the URL. The POST request line does not contain a data string, which is stored in the Request Content section and is separated by the "&" symbol between the data. The Post method is mostly used for pages in forms. Because post can also do get function, so most people use the Post method when designing the form, in fact, this is a misunderstanding. Get mode also has its own characteristics and advantages, we should choose whether to use Get or post according to different circumstances.

3>head: Just like get, only the server receives a head request and returns only the response header, instead of sending the response content. When we only need to look at the state of a page, the use of head is very efficient, because the content of the page is omitted during transmission.

2) Request header: Consists of keyword/value pairs, one pair per line, keywords and values separated by the English colon ":". The request header notifies the server that there is information about the client request, and the typical request headers are:

User-agent: The type of browser that generated the request.

Accept: A list of content types that the client can identify. The asterisk "*" is used to group types by range, with "*/*" indicating that all types are acceptable, and "type/*" indicates that all subtypes of type types are acceptable.

Host: the hostname to be requested, which allows multiple domain names to be located in the same IP address as the virtual host.

Accept-language: Natural language acceptable to the client.

Accept-encoding: The encoding compression format acceptable to the client.

Accept-charset: The character set of an acceptable answer.

Connection: Connection mode (Close or keepalive).

Cookie: stored in the Client Extension field, sending a cookie belonging to that domain to the service side of the same domain name.

3) Blank line: The last request header is followed by a blank line, which sends a carriage return and a newline character, notifying the server that no longer has the request header.

4) Request data: The request data is not used in the Get method, but is used in the Post method. The Post method is useful for situations where a customer needs to fill out a form. The most commonly used request headers associated with the requested data are Content-type and content-length.

Sample Request message:

Post/search http/1.1
Accept:image/gif, Image/x-xbitmap, Image/jpeg, Image/pjpeg, Application/vnd.ms-excel, Application/vnd.ms-powerpoint ,
Application/msword, Application/x-silverlight, Application/x-shockwave-flash, */*
Referer: <a href="http://www.google.cn/" >http://www.google.cn/</a>
Accept-language:zh-cn
Accept-encoding:gzip, deflate
user-agent:mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;. NET CLR 2.0.50727; TheWorld)
Host: <a href="http://www.google.cn" >www.google.cn</a>
Connection:keep-alive
cookie:pref=id=80a06da87be9ae3c:u=f7167333e2c3b714:nw=1:tm=1261551909:lm=1261551917:s=ybycq2wpfefs4v9g;
nid=31=ojj8d-iygaetsxlgajmqsjvhcspkvijrb6omjamnrsm8lzhky_ymfo2m4qmrkch1g0iqv9u-2hfbw7bufwvh7pgarub0rnhcju37y-
Fxlrugatx63jlv7cwmd6ub_o_r
Hl=zh-cn&source=hp&q=domety

5, the server's permanent redirect response:

The server responds with a 301 permanent redirect response to the browser so that the browser accesses "http://www.facebook.com/" rather than "http://facebook.com/". Why does the server have to redirect instead of sending the Web content that the user wants to see? One of the reasons is related to search engine rankings. If a page has two addresses, like http://www.igoro.com/and http://igoro.com/, the search engine will consider them to be two sites, resulting in fewer search links reducing the rankings. and search engine know 301 permanent redirect is what meaning, so will visit with www and without WWW address to the same site ranking. There are different addresses that can cause cache friendliness to become worse, and when a page has several names, it may appear several times in the cache.

An HTTP response message consists of a status line <status-line>, a response header
And the response data 4 parts, the general format of the response message such as:

1) status line: consists of the HTTP protocol version, the response status code returned by the server, and the text description of the response status code.

The status code consists of three digits, the first number defines the category of the response, and there are five possible values.

1XX: Informational status code indicating that the server has received a client request and that the client can continue to send the request.

Continue

101 Switching protocols

2XX: Success status code, indicating that the server has successfully received and processed the request.

A $ OK indicates a successful client request

204 No Content succeeds, but does not return the body part of any entity

206 Partial Content successfully performed a scope (range) request

3xx: Redirect status code, indicating that the server requires client redirection.

301 Moved Permanently Permanent redirect, the location header of the response message should have a new URL for the resource

302 Found Temporary redirection, the location header of the response message gives the URL used to temporarily locate the resource

303 see another URI exists for the requested resource, and the client should use the Get method to target the requested resource

304 Not Modified client sends a conditional request (the request header contains a specified header such as if-modified-since), the server may return 304, at this time, the response message does not contain any message body.

307 temporary Redirect temporary redirection. The same as 302 found meaning. 302 prohibit post transformation to get, but not necessarily when used, 307 more browsers may follow this standard, but also rely on browser implementation

4XX: Client Error status code, indicating that the client's request has illegal content.

The bad request indicates a syntax error for client requests and cannot be understood by the server

401 unauthonzed indicates that the request was not authorized and that the status code must be used with the Www-authenticate header domain

403 Forbidden indicates that the server receives the request, but refuses to provide the service, and usually gives the reason why the service is not provided in the response body

404 Not Found The requested resource does not exist, for example, the wrong URL was entered

5XX: Server error status code, indicating an unexpected error occurred when the server failed to process the client's request properly.

Internel Server error indicates that the server has unexpected errors that could cause the client's request to be completed

503 Service unavailable indicates that the server is currently not able to process client requests, and after a period of time the server may return to normal

2) Response head: Consists of keyword/value pairs, one pair per line, keywords and values separated by a colon ":", typical response headers are:

Location: Used to redirect the recipient to a new position. For example: the client requested the page no longer exists in the original location, in order to redirect the client to the new location of the page, the server can send back to the site response header after the use of redirection statements, so that the client to access the new domain name corresponding to the resources on the server

Server: Contains the software information that the server uses to process the request and its version. It corresponds to the user-agent request header domain, which sends information about the server-side software, which sends the client software (browser) and the operating system

Vary: Indicates a list of non-cacheable request headers

Connection: Connection mode

For the request: Close (tells the WEB server or proxy server that after the response of this request is completed, the connection is disconnected and no subsequent requests for this connection are made). KeepAlive (Tell the Web server or proxy server, after completing the response of this request, keep the connection, wait for the subsequent request of this connection);

For the response: close (the connection is closed); KeepAlive (connection is maintained, waiting for subsequent requests for this connection); Keep-alive: If the browser requests to remain connected, the header indicates how long (in seconds) you want the Web server to remain connected; for example: keep-alive:300;

Www-authenticate: Must be included in the 401 (unauthorized) response message, the header domain is related to the authorization request header domain mentioned earlier, and when the client receives a 401 response message, decide whether to request the server to validate it. If the server is required to validate it, a request containing the authorization header domain can be sent

3) Blank line: The last response header is followed by a blank line that sends a carriage return and a newline character, notifying the browser that no longer responds to the header.

4) Response Data: text information that the server returns to the client.

Examples of response messages:

http/1.1 301 Moved Permanently
cache-control:private, No-store, No-cache, Must-reval Idate, post-check=0,
pre-check=0
expires:sat, Jan 00:00:00 GMT
location: <a target=_blank href= "http://www.facebook.com/" >http://www.facebook.com/</A>
p3p:cp= "DSP law"
pragma:no-cache
set-cookie:made_write_conn=deleted; Expires=thu, 12-feb -2009 05:09:50 GMT;
path=/; domain=.facebook.com; HttpOnly
content-type:text/html; Charset=utf-8
X-cnection:close
Date:fri, 05:09:51 GMT
content-length:0

6. Browser Tracking REDIRECT address:

Now the browser knows that "HTTP://www.facebook.com/" is the correct address to access, so it sends another HTTP request.

7, the server "processing" the request:

The server receives the fetch request, and then processes and returns a response. This appears to be a forward-looking task, but in fact there are a lot of interesting things in between, like the author's blog such a simple site, not to mention like Facebook, such as a large number of sites to visit! Web server Software (like IIS and Apache) receives an HTTP request and then determines that a request processing is performed to handle it. Request processing is a program that can read requests and generate HTML to respond (like Asp.net,php,ruby ... ）。

8. The server sends back an HTML response

9. Release the TCP connection

If the connection mode is close, the server actively shuts down the TCP connection, the client shuts down the connection passively, releases the TCP connection, and if the connection mode is keepalive, the connection is maintained for a period of time. The request can continue to be received within that time;

10. client browser parsing HTML content

The client parses and displays the HTML text of the server response

11. The browser gets the object embedded in the HTML

When the browser displays HTML, it will notice the need to get a label for other address content. The browser then sends a FETCH request to retrieve the files. These addresses are going through a process similar to HTML reading. So the browser looks for these domain names in DNS, sends requests, redirects, and so on.

HTTP request Response Whole process

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

HTTP request Response Whole process

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

HTTP request Response Whole process

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support