How to judge the end of an HTTP message-Explanation of Python source code

Source: Internet
Author: User
Tags rfc

The default connection method of HTTP/1.1 is persistent connection. You cannot close the HTTP/connection to determine the end of httpmessage.

The following are several methods to determine the end of httpmessage:

 

1. the HTTP protocol stipulates that the status code is 1xx. The 204,304 response message cannot contain the message body, and the message entity content is directly ignored.

[Applicable to response messages]

HTTP message = HTTP Header

2. If the request message method is head, ignore the message body directly. [Applicable to request messages]

HTTP message = HTTP Header

3. If the HTTP message header contains "transfer-encoding: chunked", the chunk size is used to determine the length.

4. If the HTTP message header has Content-Length and does not have transfer-encoding (if both Content-Length and transfer-encoding exist, Content-Length is ignored ),

The message body length is determined by Content-Length.

5. If short connections (HTTP message header connection: Close) are used, you can directly determine the message transmission length by closing the connection on the server.

[Applicable to response messages. The length of an HTTP request message cannot be determined in this way]

6. It can also be determined by receiving message timeout, but it is not reliable. The HTTP Proxy server implemented by Python proxy uses the timeout mechanism. For the Source Code address, see references [7], with only over 100 lines.

The HTTP protocol specification RFC 2616 4.4 message length has a large description of the relevant content (https://tools.ietf.org/html/rfc2616#section-4.4 ).

An example of the python standard library httplib. py source code (HTTP client implementation)

The simplest way to use httplib is as follows:

import httplibconn = httplib.HTTPConnection("google.com")conn.request('GET', '/')print conn.getresponse().read()conn.close()

However, instead of directly using httplib, the higher-level encapsulation of urllib and urllib2 is used.

Conn = httplib. httpconnection ("Google.com") Create an httpconnection object and specify the webserver to be requested.

Conn. Request ('get', '/') sends an HTTP request to Google.com. The method is get.

Conn. getresponse () creates an httpresponse object, receives and reads the HTTP Response Message Header, and read () reads the Response Message Body.

Function call relationship:

Getresponse ()-> [Create httpresponse object Response]-> response. Begin ()-> response. Read ()

Focus:Begin ()AndRead (),Begin ()Four things have been completed:

(1) Create an httpmessage object and parse the header of the HTTP Response Message.

(2) check whether the header contains "transfer-encoding: chunked ".

(3) check whether the TCP connection is closed (call_ Check_close ()).

(4) If the header contains "Content-Length" without "transfer-encoding: chunked", the length of the message body is obtained.

_ Check_close ()If the HTTP response message header contains "connection: Close", the TCP connection is closed after the response is received, and some code that is backward compatible with HTTP/1.0 is also available. The default value of HTTP/1.1 is "connection: keep-alive", even if the header does not exist.

Read ()Read the HTTP Response Message Body in the Content-Length or chunked multipart mode. You can specify the number of bytes to read at a time. For chunked mode, call _ read_chunked () to read the data.

_ Read_chunked ()Reads chunks according to the chunksize. When the last chunk (the chunksize of the last chunk) is read
= 0) then the HTTP response message is received. For relevant HTTP protocol specifications, see rfc2616 3.6.1, rfc2616
19.4.6

RFC 2616 19.4.6 has a pseudocode for parsing HTTP messages in the chunked mode:

length:= 0

readchunk-size, chunk-extension (if any) and CRLF

while(chunk-size > 0) {

    read chunk-data and CRLF

    append chunk-data to entity-body

    length := length + chunk-size

    read chunk-size and CRLF

}

readentity-header

while(entity-header not empty) {

    append entity-header to existing headerfields

    read entity-header

}

Content-Length:= length

Remove"chunked" from Transfer-Encoding

Let's take a look at the main code of begin (), _ check_close (), read (), _ read_chunked:

(1)Begin ():

Def begin (Self ):...... self. MSG = httpmessage (self. FP, 0) # Don't let the MSG keep an FP self. MSG. fp = none # are we using the chunked-style of Transfer Encoding? Tr_enc = self. MSG. getheader ('transfer-encoding') If tr_enc and tr_enc.lower () = "chunked": Self. chunked = 1 self. chunk_left = none else: Self. chunked = 0 # Will the connection close at the end of the response? Self. will_close = self. _ check_close () # Do We Have A Content-Length? # Note: RFC 2616, s4.4, #3 says we ignore this if tr_enc is "chunked" length = self. MSG. getheader ('content-length') If length and not self. chunked: Try: Self. length = int (length) Comment t valueerror: Self. length = none else: If self. length <0: # ignore nonsensical negative lengths self. length = none else: Self. length = none # does the body have a fixed length? (Of zero) # no_content = 204, not_modified = 304 # determine whether the HTTP Response Message ends, see the 1st point if (status = no_content or status = not_modified or 100 <= status <200 or # 1xx codes self. _ Method = 'head'): Self. length = 0 # If the connection remains open, and we aren't using chunked, and # A Content-Length was not provided, then assume that the connection # will close. # determine whether the HTTP Response Message ends. If neither chunked nor Content-Length is used, close the connection if not self. will_close and \ not self. chunked and \ self. length is none: Self. will_close = 1

(2)_ Check_close ():

Def _ check_close (Self): # judge the end of HTTP Response Message. For details, see section 5th conn = self. MSG. getheader ('connection') If self. version = 11: # An HTTP/1.1 proxy is assumed to stay open unless # explicitly closed. conn = self. MSG. getheader ('connection') If conn and "close" in conn. lower (): Return true return false # Some HTTP/1.0 implementations have support for persistent # connections, using rules different than HTTP/1.1. # For older HTTP, keep-alive indicates persistent connection. if self. MSG. getheader ('Keep-alive'): Return false # At least Akamai returns a "connection: keep-alive" header, # which was supposed to be sent by the client. if conn and "keep-alive" in conn. lower (): Return false # proxy-connection is a Netscape hack. pconn = self. MSG. getheader ('proxy-connection') If pconn and "keep-alive" in pconn. lower (): Return false # Otherwise, assume it will close return true

(3)Read ():

Def read (self, AMT = none): If self. FP is none: Return ''if self. _ Method = 'head': Self. close () return ''if self. chunked: return self. _ read_chunked (AMT) If AMT is none: # unbounded read if self. length is none: S = self. FP. read () else: Try: S = self. _ safe_read (self. length) Does T incompleteread: Self. close () raise self. length = 0 self. close () # We read everything return s if self. length is not none: If AMT> self. length: # clip the read to the "End of response" AMT = self. length # We do not use _ safe_read () here because this may be. will_close # connection, and the user is reading more bytes than will be provided # (for example, reading in 1 K chunks) S = self. FP. read (AMT) if not s: # ideally, we wocould raise incompleteread if the Content-Length # wasn' t satisfied, but it might break compatibility. self. close () If self. length is not none: # Calculate the remaining length for the next read of self. length-= Len (s) if not self. length: Self. close () return s

(4)_ Read_chunked ():

def _read_chunked(self, amt):            assert self.chunked != _UNKNOWN        # self.chunk_left is None when reading chunk for the first time(see self.begin())        #chunk_left :bytes left in certain chunk        #chunk_left = None means that reading hasn't been started.        chunk_left = self.chunk_left        value = []        while True:            if chunk_left is None:                # read a new chunk                line = self.fp.readline(_MAXLINE + 1)                if len(line) > _MAXLINE:                    raise LineTooLong("chunk size")                i = line.find(';')                if i >= 0:                    line = line[:i] # strip chunk-extensions                try:                    chunk_left = int(line, 16)                except ValueError:                    # close the connection as protocol synchronisation is                    # probably lost                    self.close()                    raise IncompleteRead(''.join(value))                if chunk_left == 0:                    ##RFC 2661 3.6.1 last-chunk chunk_left = 0                    break            if amt is None:                value.append(self._safe_read(chunk_left))            elif amt < chunk_left:                value.append(self._safe_read(amt))                self.chunk_left = chunk_left - amt                return ''.join(value)            elif amt == chunk_left:                value.append(self._safe_read(amt))                self._safe_read(2)  # toss the CRLF at the end of the chunk                self.chunk_left = None                return ''.join(value)            else:                value.append(self._safe_read(chunk_left))                amt -= chunk_left            # we read the whole chunk, get another            self._safe_read(2)      # toss the CRLF at the end of the chunk            chunk_left = None        ......        # we read everything; close the "file"        self.close()        return ''.join(value)

Another actual source code, in pythonproxy, stops receiving messages after the timeout period is reached. _ Read_write () reads and writes the opened socket.

Def _ read_write (Self): time_out_max = self. timeout/3 Socs = [self. client, self.tar get] Count = 0 while 1: Count + = 1 # time_out = 3 (Recv, _, error) = select. select (SOCs, [], Socs, 3) If error: break if Recv: for in _ in Recv: Data = in _. recv (buflen) If in _ is self. client: Out = self.tar get else: Out = self. client if data: Out. send (data) Count = 0 # Stop receiving and sending without receiving data for consecutive time_out_max Times [timeout] If Count = time_out_max: Break

With the above analysis and source code, this question should be well answered:

When HTTP adopts the keepalive mode and the server responds to a request from the client, how does the client determine that the received HTTP responsemessage has been received?

Finally, we will attach an answer on stackoverflow about how to judge the HTTP message end:

References

[1] Hypertext Transfer Protocol -- HTTP/1.1

Https://tools.ietf.org/html/rfc2616

[2] detect end of HTTP Request body

Http://stackoverflow.com/questions/4824451/detect-end-of-http-request-body

[3] detect the end of a HTTP packet

Http://stackoverflow.com/questions/3718158/detect-the-end-of-a-http-packet

[4] determine the end of an HTTP request in keep-alive Mode

Http://blog.quanhz.com/archives/141

[5] I was sentenced to death!

Http://www.cnblogs.com/skynet/archive/2010/12/11/1903347.html

[6] talking about nginx and HTTP protocols

Http://blog.xiuwz.com/tag/content-length/

[7] Python proxy-a fast HTTP Proxy

Https://code.google.com/p/python-proxy/

[8] Python Programming Based on http: httplib, urllib, and urllib2

Http://www.cnblogs.com/chenzehe/archive/2010/08/30/1812995.html

Repost this article please indicate the author and the source [Gary's influence] http://garyelephant.me, do not for any commercial purposes!

Author: Gary Gao (garygaowork [at] gmail.com) focuses on the internet, distributed, high-performance, nosql, automation, and software teams


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.