HTTP protocol understanding and application summary

Last Update:2014-12-13 Source: Internet

Author: User

Tags session id

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Request & Response

Request format
<request-line> such as: Get/api/index.json http/1.1
<blank line>
[<request-body>] For example: id=1&timestamp=xxxxxx

Response format
<status-line> Example: http/1.1 OK
<blank line>
[<response-body>] For example: {"id": 1, "username": "TestUser"}

Status Code

HTTP status code has nearly 60, I here mainly record some common abnormal situation generated status code, in the ordinary application more or less encounter, help us to understand and find problems.
206-When the breakpoint is downloaded, the client requests a portion of the content, and the server successfully returns this part of the content to it, which is the state.
301-Permanent jump, the original address does not exist, the URL is pointed to another address. This is mainly search engine-related, affecting the crawler's retrieval behavior.
302-Temporary jump, the server will return a new URL to the client, the client can continue to access the URL to obtain the content.
304-The resource has not changed, and the client can use locally cached content, common to static content access.
413-The request entity is too large. A common scenario is uploading large files, but exceeding the server (such as Nginx) restrictions. or the request header or request body exceeds the backend server (such as Tomcat) settings (such as too many cookies under the current domain name, exceeding the request header limit)
416-related to the continuation of a breakpoint, the scope of the client request exceeds the file size on the server.
500-Server Internal error, cannot return normal results. For example, the most common application throws a null pointer exception that is not processed.
502-Gateway error. A common scenario is that the reverse proxy backend server (such as resin or tomcat) is not started.
503-The service is not available. For example, the server load is too high or the server has stopped service.
504-The gateway timed out. For example, the request is longer than the server response time limit.

Headers

The HTTP headers is divided into two categories: the request header and the response header (Response header). Here are some of the headers that we often use.

1. Cache control

In the application of Internet station, the cache is almost ubiquitous, in the HTTP-based service, we can also cache some infrequently changed content in the client, so that we can reuse the cached content in multiple accesses, speed up the access rate and improve the user experience. The HTTP protocol specifies some HTTP message headers for cache control:

Cache-control (http/1.1)/pragma (http/1.0): Indicates whether the client is cached and how long the cache takes. The default value is private, which is to cache the content in the user's private space.

For example: Cache-control:max-age=86400,must-revalidate, which tells the client that the resource cache is requested one day (max-age units are seconds, relative time) and must be re-examined after expiration.

Expires: Specifies how long a client (if not forced to refresh) can send a request to the server and read the cache directly.

Attention:

Priority: Cache-control > Expires;
Detailed parameter description: http://condor.depaul.edu/dmumaugh/readings/handouts/SE435/HTTP/node24.html
Different behavior of different browsers (refresh, rewind, Address bar return, etc.) may differ in implementation;

last-modified/if-modified-since: last-modified is the last modification timestamp of the resource returned to the client by the server, so that the The client will take the If-modified-since parameter on the next request (such as Force refresh) to verify that the resource is updated, the server returns a 304 status code if it is not updated, and the client takes the locally cached resource directly. This time there is only a request overhead and no network transport overhead. Note: The timestamp must be Greenwich (GMT) time, for example: Last-modified:sat, Oct 09:20:15 GMT

etag/if-none-match: The ETag is a resource identifier generated by an algorithm based on the file attributes, and is also used to determine if the resource requested by the client is updated. If the server returns an ETag value to the client, the next client request takes the If-none-match parameter to verify that the resource is updated and returns a 304 status code without updating. (The effect is basically equivalent to last-modified)

Attention:

The etag needs to be calculated and is a drain on servers that compute resource tensions, so some websites do not use the etag directly;
If the server is behind load balancing, requests for the same resource may be distributed to different back-end machines, because the ETag calculation relies on the file attributes, and the same files on different machines may generate different etag, which may cause the files that have not changed to pass the ETag checksum to fail. Here are two solutions: the ETag calculation does not depend on the local machine, such as the MD5 value that directly calculates the file content, and the second is distributing the same URL request to the same backend machine on the load balancer.

In our actual business scenario, the HTTP cache has a very large purpose, as listed below:

Take advantage of client-side resources, such as static files that clients need to access frequently, such as logos, ad maps, and so on, which can be easily available locally on the client. This reduces network requests, accelerates client presence, and reduces the pressure on server requests.

Some of our static content, such as news, blogs, and so on, when crawled by the search engine crawler, by controlling the cache parameters, you can reduce the crawler crawl frequency, reduce unnecessary waste of resources.

If our static resources use CDN, then set up the HTTP cache to save a file on the CDN node, reduce the number of CDN back source, reduce network latency and source station server pressure.

2. Breakpoint Requests

accept-ranges: When the server supports breakpoint download, the response header is returned to the client, and the client can send a breakpoint request after knowing this.

content-length: The length of the response information that tells the client how much data is being returned by the current request. Note here that no specific data is returned when the request is submitted with the Head method, but this content-length returns the size of the full data.

Range/content-range: When a client requests a header named Range, it tells the server what part of the data it wants to request. For example: range:bytes=0-1023 represents the request for the No. 0 to 1023th byte. The server then returns the 1024 bytes of content to the client, with the Content-range in the response header. That is: Content-range:bytes 0-1023/4096, this 4096 is the total size of the file. The next request for the client can start at the 1024th byte, range:bytes=1024-xxxx

3. Encoding

accept-encoding/content-encoding: The former is the type of message encoding that the client supports to receive. The default is identity, optional values are gzip,compress, and so on. The latter is the server-side response information content encoding type, commonly used is compression. The advantages of compression are obvious, can greatly reduce the cost of network transmission, relative to server-side compression generated by CPU consumption, network transmission reduction is obviously more real. Common form: content-encoding:gzip,deflate,compress. Usually we can compress the results of responses such as Html,js,css,xml,json.

4. Other

x-forward-for: request header. Used to identify the real IP of the user, especially if the server is accessed by proxy (forward or reverse) or the server is behind a load-balanced device. Format: X-forward-for:client,proxy1,proxy2,... On the far left is the IP closest to the client.

user-agent: request header. The server is used to identify the client's basic information. Generally this is useful in identifying search crawlers, and some scenarios can also be used to do some client statistics.

Referer: request header. When the client accesses the server, this Referer to specify the source of the request, for example, from which website the link is received, which we will often use in some statistics. Another important use is to filter out illegal request sources in a scenario where the resource anti-theft chain is needed (however, this referer can be faked by the client).

Location: response header. In the response header of the 301/302 status code, the location header is brought in to instruct the client to access the required resources with the new address.

Connection: request/response header. In http/1.1, both the client and the server are kept connected by default, which is Connection: Keep-alive. If either side does not want to remain connected, you can set this value to close. By default, the client and server maintain a long connection so that clients can use the connection to send multiple HTTP requests, reducing the consumption of frequent connection creation. For this parameter, there may be more settings on the server, such as the time to connect to the keep-alive, some network parameter settings (for TCP) of the kernel.

Session and Cookie

HTTP requests are stateless requests, but in our Internet applications, we often need to identify user state information to complete some interactive operations, such as user authentication to record user login status, shopping cart application to remember the user's choice of goods, advertising application to record the user's history browsing behavior and so on. The session and cookie will be used here.

session: Refers to the HTTP request-response process of the client-server interaction state, the information is stored on the server side, such as memory, database, etc. Each session has a unique identity, generated by the server, the identity is also to be saved on the client, so that the client can take this identity on the next request, so that the server to determine the status of the client.

Client support for the session:

The session ID is saved through a cookie and sent to the server when requested.

The session ID is communicated with the server through the parameters of the URL.

The session ID is communicated with the server through the form's hidden field.

Session Sharing issues:

In distributed applications, our HTTP server is typically behind a reverse proxy or a load-balanced device, which faces a session sharing issue. That is, multiple requests from the same user may be distributed to many different machines, and if we keep the session in the local memory of the machine, we cannot share the user's session among multiple machines. This problem, generally speaking, we can have two ways to solve:

The session is stored in distributed memory (eg:memcached) or centralized storage (eg:database).

Distribute the same user's requests to the same machine on the reverse proxy or load balancer (the problem of requesting redistribution after the machine is down).

Cookies: The client maintains state information, each cookie content belongs to a specific domain (domain) and path (path), and for security reasons, cookies under different domains or paths cannot be shared.

Session Cookie: No expiry time specified, saved in memory, and expired when browser is closed.

Persistent Cookie: Specifies the expiration time, which is saved locally in the browser.

For more information, refer to: Http://en.wikipedia.org/wiki/HTTP_cookie

It is important to note that there are some security issues with cookies.

Here I just summed up my work encountered in the HTTP protocol related to some of the understanding of the HTTP protocol there are a lot of things to dig, but also need to continue to explore, the understanding of the HTTP protocol will give our development and application of great convenience.

Finally, two very NB HTTP debugging tools are recommended: Fiddler (Windows) and Charles (MAC) have HTTP proxy capabilities, and for non-browser-based HTTP applications (such as mobile apps), you can use these two tools to monitor HTTP requests.

HTTP protocol understanding and application summary

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More