The default connection method of HTTP/1.1 is persistent connection. You cannot close the HTTP/connection to determine the end of httpmessage.
The following are several methods to determine the end of httpmessage:
1. the HTTP protocol stipulates that the status code is 1xx. The 204,304 response message cannot contain the message body, and the message entity content is directly ignored.
[Applicable to response messages]
HTTP message = HTTP Header
2. If the request message method is head, ignore the message body directly. [Applicable to request messages]
HTTP message = HTTP Header
3. If the HTTP message header contains "transfer-encoding: chunked", the chunk size is used to determine the length.
4. If the HTTP message header has Content-Length and does not have transfer-encoding (if both Content-Length and transfer-encoding exist, Content-Length is ignored ),
The message body length is determined by Content-Length.
5. If short connections (HTTP message header connection: Close) are used, you can directly determine the message transmission length by closing the connection on the server.
[Applicable to response messages. The length of an HTTP request message cannot be determined in this way]
6. It can also be determined by receiving message timeout, but it is not reliable. The HTTP Proxy server implemented by Python proxy uses the timeout mechanism. For the Source Code address, see references [7], with only over 100 lines.
The HTTP protocol specification RFC 2616 4.4 message length has a large description of the relevant content (https://tools.ietf.org/html/rfc2616#section-4.4 ).
An example of the python standard library httplib. py source code (HTTP client implementation)
The simplest way to use httplib is as follows:
import httplibconn = httplib.HTTPConnection("google.com")conn.request('GET', '/')print conn.getresponse().read()conn.close()
However, instead of directly using httplib, the higher-level encapsulation of urllib and urllib2 is used.
Conn = httplib. httpconnection ("Google.com") Create an httpconnection object and specify the webserver to be requested.
Conn. Request ('get', '/') sends an HTTP request to Google.com. The method is get.
Conn. getresponse () creates an httpresponse object, receives and reads the HTTP Response Message Header, and read () reads the Response Message Body.
Function call relationship:
Getresponse ()-> [Create httpresponse object Response]-> response. Begin ()-> response. Read ()
Focus:Begin ()AndRead (),Begin ()Four things have been completed:
(1) Create an httpmessage object and parse the header of the HTTP Response Message.
(2) check whether the header contains "transfer-encoding: chunked ".
(3) check whether the TCP connection is closed (call_ Check_close ()).
(4) If the header contains "Content-Length" without "transfer-encoding: chunked", the length of the message body is obtained.
_ Check_close ()If the HTTP response message header contains "connection: Close", the TCP connection is closed after the response is received, and some code that is backward compatible with HTTP/1.0 is also available. The default value of HTTP/1.1 is "connection: keep-alive", even if the header does not exist.
Read ()Read the HTTP Response Message Body in the Content-Length or chunked multipart mode. You can specify the number of bytes to read at a time. For chunked mode, call _ read_chunked () to read the data.
_ Read_chunked ()Reads chunks according to the chunksize. When the last chunk (the chunksize of the last chunk) is read
= 0) then the HTTP response message is received. For relevant HTTP protocol specifications, see rfc2616 3.6.1, rfc2616
19.4.6
RFC 2616 19.4.6 has a pseudocode for parsing HTTP messages in the chunked mode:
length:= 0readchunk-size, chunk-extension (if any) and CRLF
while(chunk-size > 0) {
read chunk-data and CRLF
append chunk-data to entity-body
length := length + chunk-size
read chunk-size and CRLF
}
readentity-header
while(entity-header not empty) {
append entity-header to existing headerfields
read entity-header
}
Content-Length:= length
Remove"chunked" from Transfer-Encoding
Let's take a look at the main code of begin (), _ check_close (), read (), _ read_chunked:
(1)Begin ():
Def begin (Self ):...... self. MSG = httpmessage (self. FP, 0) # Don't let the MSG keep an FP self. MSG. fp = none # are we using the chunked-style of Transfer Encoding? Tr_enc = self. MSG. getheader ('transfer-encoding') If tr_enc and tr_enc.lower () = "chunked": Self. chunked = 1 self. chunk_left = none else: Self. chunked = 0 # Will the connection close at the end of the response? Self. will_close = self. _ check_close () # Do We Have A Content-Length? # Note: RFC 2616, s4.4, #3 says we ignore this if tr_enc is "chunked" length = self. MSG. getheader ('content-length') If length and not self. chunked: Try: Self. length = int (length) Comment t valueerror: Self. length = none else: If self. length <0: # ignore nonsensical negative lengths self. length = none else: Self. length = none # does the body have a fixed length? (Of zero) # no_content = 204, not_modified = 304 # determine whether the HTTP Response Message ends, see the 1st point if (status = no_content or status = not_modified or 100 <= status <200 or # 1xx codes self. _ Method = 'head'): Self. length = 0 # If the connection remains open, and we aren't using chunked, and # A Content-Length was not provided, then assume that the connection # will close. # determine whether the HTTP Response Message ends. If neither chunked nor Content-Length is used, close the connection if not self. will_close and \ not self. chunked and \ self. length is none: Self. will_close = 1
(2)_ Check_close ():
Def _ check_close (Self): # judge the end of HTTP Response Message. For details, see section 5th conn = self. MSG. getheader ('connection') If self. version = 11: # An HTTP/1.1 proxy is assumed to stay open unless # explicitly closed. conn = self. MSG. getheader ('connection') If conn and "close" in conn. lower (): Return true return false # Some HTTP/1.0 implementations have support for persistent # connections, using rules different than HTTP/1.1. # For older HTTP, keep-alive indicates persistent connection. if self. MSG. getheader ('Keep-alive'): Return false # At least Akamai returns a "connection: keep-alive" header, # which was supposed to be sent by the client. if conn and "keep-alive" in conn. lower (): Return false # proxy-connection is a Netscape hack. pconn = self. MSG. getheader ('proxy-connection') If pconn and "keep-alive" in pconn. lower (): Return false # Otherwise, assume it will close return true
(3)Read ():
Def read (self, AMT = none): If self. FP is none: Return ''if self. _ Method = 'head': Self. close () return ''if self. chunked: return self. _ read_chunked (AMT) If AMT is none: # unbounded read if self. length is none: S = self. FP. read () else: Try: S = self. _ safe_read (self. length) Does T incompleteread: Self. close () raise self. length = 0 self. close () # We read everything return s if self. length is not none: If AMT> self. length: # clip the read to the "End of response" AMT = self. length # We do not use _ safe_read () here because this may be. will_close # connection, and the user is reading more bytes than will be provided # (for example, reading in 1 K chunks) S = self. FP. read (AMT) if not s: # ideally, we wocould raise incompleteread if the Content-Length # wasn' t satisfied, but it might break compatibility. self. close () If self. length is not none: # Calculate the remaining length for the next read of self. length-= Len (s) if not self. length: Self. close () return s
(4)_ Read_chunked ():
def _read_chunked(self, amt): assert self.chunked != _UNKNOWN # self.chunk_left is None when reading chunk for the first time(see self.begin()) #chunk_left :bytes left in certain chunk #chunk_left = None means that reading hasn't been started. chunk_left = self.chunk_left value = [] while True: if chunk_left is None: # read a new chunk line = self.fp.readline(_MAXLINE + 1) if len(line) > _MAXLINE: raise LineTooLong("chunk size") i = line.find(';') if i >= 0: line = line[:i] # strip chunk-extensions try: chunk_left = int(line, 16) except ValueError: # close the connection as protocol synchronisation is # probably lost self.close() raise IncompleteRead(''.join(value)) if chunk_left == 0: ##RFC 2661 3.6.1 last-chunk chunk_left = 0 break if amt is None: value.append(self._safe_read(chunk_left)) elif amt < chunk_left: value.append(self._safe_read(amt)) self.chunk_left = chunk_left - amt return ''.join(value) elif amt == chunk_left: value.append(self._safe_read(amt)) self._safe_read(2) # toss the CRLF at the end of the chunk self.chunk_left = None return ''.join(value) else: value.append(self._safe_read(chunk_left)) amt -= chunk_left # we read the whole chunk, get another self._safe_read(2) # toss the CRLF at the end of the chunk chunk_left = None ...... # we read everything; close the "file" self.close() return ''.join(value)
Another actual source code, in pythonproxy, stops receiving messages after the timeout period is reached. _ Read_write () reads and writes the opened socket.
Def _ read_write (Self): time_out_max = self. timeout/3 Socs = [self. client, self.tar get] Count = 0 while 1: Count + = 1 # time_out = 3 (Recv, _, error) = select. select (SOCs, [], Socs, 3) If error: break if Recv: for in _ in Recv: Data = in _. recv (buflen) If in _ is self. client: Out = self.tar get else: Out = self. client if data: Out. send (data) Count = 0 # Stop receiving and sending without receiving data for consecutive time_out_max Times [timeout] If Count = time_out_max: Break
With the above analysis and source code, this question should be well answered:
When HTTP adopts the keepalive mode and the server responds to a request from the client, how does the client determine that the received HTTP responsemessage has been received?
Finally, we will attach an answer on stackoverflow about how to judge the HTTP message end:
References
[1] Hypertext Transfer Protocol -- HTTP/1.1
Https://tools.ietf.org/html/rfc2616
[2] detect end of HTTP Request body
Http://stackoverflow.com/questions/4824451/detect-end-of-http-request-body
[3] detect the end of a HTTP packet
Http://stackoverflow.com/questions/3718158/detect-the-end-of-a-http-packet
[4] determine the end of an HTTP request in keep-alive Mode
Http://blog.quanhz.com/archives/141
[5] I was sentenced to death!
Http://www.cnblogs.com/skynet/archive/2010/12/11/1903347.html
[6] talking about nginx and HTTP protocols
Http://blog.xiuwz.com/tag/content-length/
[7] Python proxy-a fast HTTP Proxy
Https://code.google.com/p/python-proxy/
[8] Python Programming Based on http: httplib, urllib, and urllib2
Http://www.cnblogs.com/chenzehe/archive/2010/08/30/1812995.html
Repost this article please indicate the author and the source [Gary's influence] http://garyelephant.me, do not for any commercial purposes!
Author: Gary Gao (garygaowork [at] gmail.com) focuses on the internet, distributed, high-performance, nosql, automation, and software teams