如何判斷一個Http Message的結束——python源碼解讀

來源:互聯網
上載者:User

HTTP/1.1 預設的串連方式是長串連,不能通過簡單的TCP串連關閉判斷HttpMessage的結束。

以下是幾種判斷HttpMessage結束的方式:

 

1.      HTTP協議約定status code 為1xx,204,304的應答訊息不能包含訊息體(Message Body), 直接忽略掉訊息實體內容。

        [適用於應答訊息]

        Http Message =Http Header

2.      如果請求訊息的Method為HEAD,則直接忽略其訊息體。[適用於請求訊息]

         Http Message =Http Header

3.      如果Http訊息頭部有“Transfer-Encoding:chunked”,則通過chunk size判斷長度。

4.      如果Http訊息頭部有Content-Length且沒有Transfer-Encoding(如果同時有Content-Length和Transfer-Encoding,則忽略Content-Length),

         則通過Content-Length判斷訊息體長度。

5.      如果採用短串連(Http Message頭部Connection:close),則直接可以通過伺服器關閉串連來確定訊息的傳輸長度。

         [適用於應答訊息,Http請求訊息不能以這種方式確定長度]

6.      還可以通過接收訊息逾時判斷,但是不可靠。Python Proxy實現的httpProxy 伺服器用到了逾時機制,源碼地址見References[7],僅100多行。

HTTP協議規範RFC 2616的4.4 Message Length中對相關內容有較多的描述(https://tools.ietf.org/html/rfc2616#section-4.4)。

                              

一個執行個體,Python標準庫httplib.py源碼解讀(http協議用戶端的實現)

httplib最簡單的使用方法:

import httplibconn = httplib.HTTPConnection("google.com")conn.request('GET', '/')print conn.getresponse().read()conn.close()

但是一般不直接使用httplib,而是使用更高層的封裝urllib,urllib2

conn = httplib.HTTPConnection("google.com")建立HTTPConnection對象,指定要請求的webserver.

conn.request('GET', '/')向google.com發送http請求,Method為GET

conn.getresponse()建立HTTPResponse對象,接收並讀取http應答訊息頭,read()讀取應答訊息體。

函數調用關係:

       getresponse()->[建立HTTPResponse對象response]-> response.begin()->response.read()

重點是begin()read()begin()完成了4件事:

       (1)建立HTTPMessage對象並解析Http應答訊息的頭部。

       (2)查看頭部是否有“Transfer-Encoding:chunked”。

       (3)查看接收完應答訊息後是否關閉TCP串連(調用_check_close())。

       (4)如果頭部有“Content-Length”並且沒有“Transfer-Encoding:chunked”,則擷取訊息體長度。

          _check_close()判斷若Http應答訊息頭部有“Connection:close”則接收完應答訊息後關閉TCP串連,同時還有一些向後相容HTTP/1.0的代碼。HTTP/1.1預設是“Connection:Keep-Alive”,即使頭部中沒有。

          read()根據Content-Length或chunked分塊方式讀取Http應答訊息體,可一次全部讀取也可以指定要讀取的位元組數。如果是chunked方式,調用_read_chunked()讀取。

       _read_chunked()根據chunksize讀取chunks,當讀取完最後一個chunk(最後一個chunk的chunksize
= 0)後就完成了Http應答訊息的接收。相關的HTTP協議規範參考RFC2616 3.6.1,RFC2616
19.4.6

RFC 2616 19.4.6有一段如何解析chunked方式的Http訊息的虛擬碼:

length:= 0

readchunk-size, chunk-extension (if any) and CRLF

while(chunk-size > 0) {

    read chunk-data and CRLF

    append chunk-data to entity-body

    length := length + chunk-size

    read chunk-size and CRLF

}

readentity-header

while(entity-header not empty) {

    append entity-header to existing headerfields

    read entity-header

}

Content-Length:= length

Remove"chunked" from Transfer-Encoding

來看一下begin(),_check_close(),read(),_read_chunked()的主要代碼:

(1)begin():

 def begin(self):......        self.msg = HTTPMessage(self.fp, 0)        # don't let the msg keep an fp        self.msg.fp = None        # are we using the chunked-style of transfer encoding?        tr_enc = self.msg.getheader('transfer-encoding')        if tr_enc and tr_enc.lower() == "chunked":            self.chunked = 1            self.chunk_left = None        else:            self.chunked = 0        # will the connection close at the end of the response?        self.will_close = self._check_close()        # do we have a Content-Length?        # NOTE: RFC 2616, S4.4, #3 says we ignore this if tr_enc is "chunked"        length = self.msg.getheader('content-length')        if length and not self.chunked:            try:                self.length = int(length)            except ValueError:                self.length = None            else:                if self.length < 0:  # ignore nonsensical negative lengths                    self.length = None        else:            self.length = None        # does the body have a fixed length? (of zero)        # NO_CONTENT = 204, NOT_MODIFIED = 304        #判斷Http Response Message 結束,見本文開頭總結的第1點        if (status == NO_CONTENT or status == NOT_MODIFIED or            100 <= status < 200 or      # 1xx codes            self._method == 'HEAD'):            self.length = 0        # if the connection remains open, and we aren't using chunked, and        # a content-length was not provided, then assume that the connection        # WILL close.        #判斷Http Response Message 結束,如果沒有chunked和Content-Length都沒有使用,就關閉串連        if not self.will_close and \           not self.chunked and \           self.length is None:            self.will_close = 1

(2)_check_close():

    def _check_close(self):        #判斷Http Response Message 結束,見本文開頭總結的第5點        conn = self.msg.getheader('connection')        if self.version == 11:            # An HTTP/1.1 proxy is assumed to stay open unless            # explicitly closed.            conn = self.msg.getheader('connection')            if conn and "close" in conn.lower():                return True            return False        # Some HTTP/1.0 implementations have support for persistent        # connections, using rules different than HTTP/1.1.        # For older HTTP, Keep-Alive indicates persistent connection.        if self.msg.getheader('keep-alive'):            return False        # At least Akamai returns a "Connection: Keep-Alive" header,        # which was supposed to be sent by the client.        if conn and "keep-alive" in conn.lower():            return False        # Proxy-Connection is a netscape hack.        pconn = self.msg.getheader('proxy-connection')        if pconn and "keep-alive" in pconn.lower():            return False        # otherwise, assume it will close        return True

(3)read():

    def read(self, amt=None):        if self.fp is None:            return ''        if self._method == 'HEAD':            self.close()            return ''        if self.chunked:            return self._read_chunked(amt)        if amt is None:            # unbounded read            if self.length is None:                s = self.fp.read()            else:                try:                    s = self._safe_read(self.length)                except IncompleteRead:                    self.close()                    raise                self.length = 0            self.close()        # we read everything            return s        if self.length is not None:            if amt > self.length:                # clip the read to the "end of response"                amt = self.length        # we do not use _safe_read() here because this may be a .will_close        # connection, and the user is reading more bytes than will be provided        # (for example, reading in 1k chunks)        s = self.fp.read(amt)        if not s:            # Ideally, we would raise IncompleteRead if the content-length            # wasn't satisfied, but it might break compatibility.            self.close()        if self.length is not None:            #計算剩餘長度,供下次讀取            self.length -= len(s)            if not self.length:                self.close()        return s

(4) _read_chunked():

def _read_chunked(self, amt):            assert self.chunked != _UNKNOWN        # self.chunk_left is None when reading chunk for the first time(see self.begin())        #chunk_left :bytes left in certain chunk        #chunk_left = None means that reading hasn't been started.        chunk_left = self.chunk_left        value = []        while True:            if chunk_left is None:                # read a new chunk                line = self.fp.readline(_MAXLINE + 1)                if len(line) > _MAXLINE:                    raise LineTooLong("chunk size")                i = line.find(';')                if i >= 0:                    line = line[:i] # strip chunk-extensions                try:                    chunk_left = int(line, 16)                except ValueError:                    # close the connection as protocol synchronisation is                    # probably lost                    self.close()                    raise IncompleteRead(''.join(value))                if chunk_left == 0:                    ##RFC 2661 3.6.1 last-chunk chunk_left = 0                    break            if amt is None:                value.append(self._safe_read(chunk_left))            elif amt < chunk_left:                value.append(self._safe_read(amt))                self.chunk_left = chunk_left - amt                return ''.join(value)            elif amt == chunk_left:                value.append(self._safe_read(amt))                self._safe_read(2)  # toss the CRLF at the end of the chunk                self.chunk_left = None                return ''.join(value)            else:                value.append(self._safe_read(chunk_left))                amt -= chunk_left            # we read the whole chunk, get another            self._safe_read(2)      # toss the CRLF at the end of the chunk            chunk_left = None        ......        # we read everything; close the "file"        self.close()        return ''.join(value)

另一個實際的源碼,PythonProxy中,到達逾時時間後停止接收訊息。_read_write()讀取和寫入已開啟的socket。

def _read_write(self):        time_out_max = self.timeout/3        socs = [self.client, self.target]        count = 0        while 1:            count += 1            # time_out = 3            (recv, _, error) = select.select(socs, [], socs, 3)            if error:                break            if recv:                for in_ in recv:                    data = in_.recv(BUFLEN)                    if in_ is self.client:                        out = self.target                    else:                        out = self.client                    if data:                        out.send(data)                        count = 0            #連續time_out_max次未接收到資料就停止接收和發送[逾時了]            if count == time_out_max:                break

有了上面的分析和源碼,這個問題應該很好回答了:

        當HTTP採用keepalive模式,當伺服器響應用戶端的請求後,用戶端如何判斷接收到的Http ResponseMessage已經接收完成?

最後,再附上stackoverflow上一個關於如何判斷Http Message結束的回答:

References

[1]Hypertext Transfer Protocol -- HTTP/1.1

      

https://tools.ietf.org/html/rfc2616

[2]Detect end of HTTP request body

       http://stackoverflow.com/questions/4824451/detect-end-of-http-request-body

[3]Detect the end of a HTTP packet

      

http://stackoverflow.com/questions/3718158/detect-the-end-of-a-http-packet

[4] 判斷Keep-Alive模式的HTTP請求的結束

      

http://blog.quanhz.com/archives/141

[5] 這樣被判了死刑!

      http://www.cnblogs.com/skynet/archive/2010/12/11/1903347.html

[6]雜談Nginx與HTTP協議

     

http://blog.xiuwz.com/tag/content-length/

[7]Python Proxy- A Fast HTTP proxy

       https://code.google.com/p/python-proxy/

[8] python基於http協議編程:httplib,urllib和urllib2

     

http://www.cnblogs.com/chenzehe/archive/2010/08/30/1812995.html

轉載本文請註明作者和出處[Gary的影響力]http://garyelephant.me,請勿用於任何商業用途!

Author: Gary Gao( garygaowork[at]gmail.com) 關注互連網、分布式、高效能、NoSQL、自動化、軟體團隊


相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.