HTTP Headers parsing

Source: Internet
Author: User

What is an HTTP Headers? What does it contain? Using the Requests.get () function to request a watercress reading, the returned r.headers is as Follows:

>>>ImportRequests>>> r = Requests.get ('https://book.douban.com/')>>>r.headers{'x-powered-by-ads':'chn-shads-4-12','x-xss-protection':'1; Mode=block','X-dae-app':' book','x-content-type-options':'Nosniff','content-encoding':'gzip','transfer-encoding':'chunked','Set-cookie':'bid=zt-mrsxmmx0; expires=mon, 23-oct-17 03:11:40 GMT; domain=.douban.com; path=/, __ads_session=2yy49z4ezghqoiuo9ga=; domain=.douban.com; path=/','Expires':'Sun, 1 Jan 2006 01:00:00 GMT','Vary':'accept-encoding','X-dae-node':'nain3','Server':'adsserver/44619','X-douban-mobileapp':'0','Connection':'keep-alive','Pragma':'No-cache','Cache-control':'must-revalidate, no-cache, Private','Date':'Sun, Oct 03:11:40 GMT','strict-transport-security':'max-age=15552000;','X-douban-newbid':'zt-mrsxmmx0','Content-type':'text/html; Charset=utf-8'}

' Content-type ': ' text/html; Charset=utf-8 '

This is the mime-type of the document, and the browser determines how the document will be parsed based on this parameter. For example, an HTML page return value Is:

' text/html; Charset=utf-8 ';

/previous ' text ' indicates the document type,/followed by ' HTML ' to represent the subtype of the document;

If it is a picture, it returns: ' Image/jpeg ', which indicates that the target is an image, in particular a JPEG image.

>>> p = requests.get ('http://pic33.nipic.com/20130916/3420027_192919547000_2.jpg'  )>>> p.headers['content-type'image/ JPEG'

If it is a PDF document, return ' application/pdf '

Import requests>>> s = requests.get ('http://www.em-consulte.com/showarticlefile/738819/ Main.pdf')>>> s.headers['content-type'] ' application/pdf '

There are more than three types of mime-type, and there are other types of mime-type that can be seen here.

' Cache-control ': ' must-revalidate, no-cache, private '

The Cache control field is used to specify the instructions that all caching mechanisms must obey throughout the Request/response. Common values are: private (default), no-cache, max-age, must-revalidate

Cache instruction Description
Public All content will be cached (both client and proxy servers can be cached)
Private Content is cached only in the private cache (client cacheable, Proxy server Unavailable)
No-cache You must confirm with the server whether the returned corresponding changes have been made before you can use the response to satisfy the request for the same URL
Must-revalidate If the cache content fails, the request must be sent to the Server/proxy for re-authentication
max-age=*** Cached content will expire after ***s

Caching technology can reduce the server load, reduce network congestion, The basic idea is to take advantage of customer access time locality (temporary Location) principle, The customer access to the content in the cache to put a copy, so that if the content is again accessed, there is no need to send a request again, and take the page directly from the Cache. This mechanism is good, but it can also cause problems: 1) the user may again request to obtain the content is outdated content; 2) if the cache fails, the Client's access delay will increase rather than the direct request;

' Expires ': ' Sun, 1 Jan 2006 01:00:00 GMT '

Expires (term) provides a date and time that the response is considered invalid after that date and Time. Data can be obtained from the cache before the expiration time, without the need to request Again.

It has a drawback is that it returns the server side of the time, if the client time and server side time is different or very different, then the error is relatively large. With Cache-control's max-age, This expires function is Replaced.

The Cache-control priority is higher than expires, which is used to indicate the current Resource's validity period. But Cache-control's settings are more Nuanced.

The process of the first request:

The process of the second request:

' Content-encoding ': ' gzip '

1. The client sends an HTTP request to the server with Accept-encoding:gzip in the request, deflate (tells the server browser to support gzip Compression)

2. After the server receives the request, it generates the original response, which has the original content-type, content-length

3. The server encodes the response via gzip, the encoded headers has content-type, content-length, and content-encoding:gzip, Then send this response to the client

4. After the client receives the response, the response is decoded according to Content-encoding:gzip to obtain the uncompressed Response.

' Set-cookie ': ' bid=zt-mrsxmmx0; expires=mon, 23-oct-17 03:11:40 GMT; domain=.douban.com; path=/, __ads_session=2yy49z4ezghqoiuo9ga=; domain=.douban.com; path=/'

A cookie is a piece of ASCII text that the Web server sends to the client, and once a cookie is received, the browser stores the Cookie's information fragment as a "name/value" pair (name-value pairs). after that, Whenever a new document is requested from the same Web server, the Web server changes back to the cookie that was stored locally before it was Sent. The first purpose of creating a cookie is to allow the Web server to track customers through multiple HTTP Requests.

Reference:

[1] Browser HTTP protocol caching mechanism detailed: https://my.oschina.net/leejun2005/blog/369148

[2] cache mechanism in HTTP request: http://blog.chinaunix.net/uid-11639156-id-3214858.html

[3] HTTP Compression content-encoding:gzip:http://liuviphui.blog.163.com/blog/static/20227308420141843933379/

[4] Complete Set-cookie head: http://blog.sina.com.cn/s/blog_70c4d9410100z3il.html

HTTP Headers parsing

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.