about how PHP uses curl to read HTTP chunked data _php instances

Source: Internet
Author: User
Tags cdata
For the HTTP chunked data returned by the WEB server, we may want to get a callback when each chunk returns, instead of all the responses returning and then callbacks. For example, when the server is Icomet.

Using the Curl code in PHP is as follows:

<?php $url = "Http://127.0.0.1:8100/stream"; $ch = Curl_init ($url); curl_setopt ($ch, curlopt_writefunction, ' MyFunc ' ), $result = Curl_exec ($ch), Curl_close ($ch), function MyFunc ($ch, $data) {$bytes = strlen ($data);//Handle DataReturn $bytes;}

However, there is a problem here. For a chunk, the callback function may be called multiple times, each time approximately 16k of data. This is obviously not what we want to get. Because a chunk of Icomet is terminated with "\ n", the callback function can do a buffer.

function MyFunc ($ch, $data) {$bytes = strlen ($data); static $buf = "; $buf. = $data; while (1) {$pos = Strpos ($buf," \ n "); if ($p OS = = = False) {break;} $data = substr ($buf, 0, $pos + 1), $buf = substr ($buf, $pos + 1);//Processing Data}}

below to introduce you to the next chunked PHP using Fsockopen to read segmented data (transfer-encoding:chunked)

The use of Fsockopen to read the data encountered a magical problem, the situation is as follows:

Read Address: Http://blog.maxthon.cn/?feed=rss2

To read the code:

<?PHP$FP = Fsockopen ("blog.maxthon.cn", $errno, $errstr), if (! $fp) {echo "$errstr ($errno)
\ n ";} else {$out = "GET/?feed=rss2 http/1.1\r\n"; $out. = "host:blog.maxthon.cn\r\n"; $out. = "connection:close\r\n\r\n"; fwrit E ($fp, $out), while (!feof ($fp)) {echo fgets ($fp, 128);} Fclose ($FP);}? >

Return HTTP content:

Date:mon, Mar 10:16:13 gmtserver:apache/2.2.8 (Unix) mod_ssl/2.2.8 openssl/0.9.8b php/5.2.6x-powered-by:php/5.2 .6x-pingback:http://blog.maxthon.cn/xmlrpc.phplast-modified:wed, 03:13:41 Gmtetag: " 8f16b619f32188bde3bc008a60c2cc11 "Keep-alive:timeout=15, max=120connection:keep-alivetransfer-encoding: Chunkedcontent-type:text/xml; Charset=utf-822de<?xml version= "1.0" encoding= "UTF-8"?>
 <![cdata[2009 December 31 1711.......1fe8 ]]>
  <![cdata[<p> December 31, 2009 <BR/>1711 </p> </pre> <p> Note the Red 4 characters above, which appear once every other time, but do not use other methods such as curl,file_get_contents to retrieve data. to other sites to crawl, but also only a few sites will appear this situation, multi-party search without solution, I inadvertently see the above return header has such a declaration: transfer-encoding:chunked, and the common content-lenght field is not. The approximate meaning of this statement is that the transmission is encoded in a segmented manner. <BR/> </p> <p> Searches for this keyword on Google and finds an explanation for the statement on Wikipedia (since there is no Chinese version, I can only translate it by myself): <BR/> </p> <pre class= "Brush:php;toolbar:false" >chunked Transfer Encoding is a mechanism this allows HTTP messages to being split in SEv eral parts. This can is applied to both HTTP requests (from client to server) and HTTP responses (from server to client) </pre> <BR/><p The chunked transfer encoding is a mechanism that allows the transmission of HTTP messages into several parts. Also applies to HTTP requests (from client to server) and HTTP responses (from server to client) <BR/> </p> <p> For example, let us consider the "the" in which a HTTP server may trans MIT data to a client application (usually a Web browser). Normally, data delivered in HTTP responses are sent in one piece, whose length are indicated by the Content-length header fi Eld. The length of the data is important, because the client needs to know where the REsponse ends and any following response starts. With chunked encoding, however, the data is broken to a series of blocks of data and transmitted in one or more "Chun KS "So" a server could start sending data before it knows the final size of the content that it's sending. Often, the size of these blocks is the same, but this is not always the CASE.<BR/> </p> <p> For example, let's consider that an HTTP server can transfer data to a client application (typically is a Web browser) which methods to use. Typically, the HTTP response data is sent to the client as a whole block, and the length of the data is represented by the Content-length header field. The length of the data is important because the customer needs to know where the response ends and when the response is started. Using chunked encoding, however, the data is split into a series of chunks and one or more forwarded "blocks", so the server can start sending the data before it knows the length of the content. Typically, the size of these blocks is the same, but it is not absolutely true. <BR/> </p> <p> After the approximate meaning, let's take a look at the example: <br/> </p> <p> chunked encoding is concatenated with several chunk, ending with a chunk mark with a length of 0. Each chunk is divided into the head and the body two parts, the head content to specify the next paragraph of the text of the total number of characters (16 binary numbers) and the number of units (generally do not write), the body part is the actual content of the specified length, between the two parts with a carriage return line (CRLF) separated. In the last chunk of length 0 is the content called footer, which is some additional header information (which can usually be ignored directly). The specific chunk encoding format is as follows: <BR/> </p> <p> The response content of the code: </p> <p> http/1.1 ok<br/>content-type:text/plain<br/>transfer-encoding: chunked </p> <p> 25<br/> </p> <p> This is the first piece of data </p> <p>1A<BR/> </p> <p> And this is the second piece of data <br/> </p> <p> 0 </p> <p> Decoded data: <br/> </p> <p> This is the first paragraph, and then this is the second paragraph of data <br/> </p> <p> The situation is clear, So how do we decode the encoded data? <BR/> </p> <p> In the comments below the official PHP manual Fsockopen function, many people have proposed a workaround <br/> </p> <p> <strong> Method 1. </strong> <BR/> </p> <pre class= "Brush:php;toolbar:false" ><& #63;p hpfunction unchunk ($result) {return preg_replace_callback ('/ (& #63;:(& #63;: \r\n|\n) |^) ([0-9a-f]+) (& #63;: \r\n|\n) {(. *& #63;) '. ') ((& #63;: \r\n|\n) (& #63;: [0-9a-f]+ (& #63;: \r\n|\n)) |$)/si ', create_function (' $matches ', ' Return Hexdec ($matches [1]) = = Strlen ($matches [2]) & #63; $matches [2]: $matches [0]; '), $result);} </pre> <p> <strong> method Two. </strong> <BR/> </p> <pre class= "Brush:php;toolbar:false" >function unchunkHttp11 ($data) {$fp = 0; $outData = ""; while ($fp strlen ( $data) {$rawnum = substr ($data, $fp, Strpos (substr ($data, $fp), "\ r \ n") + 2); $num = Hexdec (Trim ($rawnum)); $fp + = strlen ($ Rawnum); $chunk = substr ($data, $fp, $num); $outData. = $chunk; $fp + = strlen ($chunk);} return $outData;} </pre> <p> <span style= "color: #ff0000" > Note: The parameters of both functions are returned by the HTTP primitiveData (including headers) </span> <li ><i class= "Layui-icon" >& #xe63a; ]]>
  
 
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.