about how PHP uses curl to read HTTP chunked data

about how PHP uses curl to read HTTP chunked data _php instances

Last Update:2017-01-19 Source: Internet

Author: User

Tags cdata curl

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

For HTTP chunked data returned by the WEB server, we might want to get a callback when each chunk returns, rather than all the responses returned. For example, when the server is Icomet.

Use the Curl code in PHP as follows:

<?php 
$url = "Http://127.0.0.1:8100/stream";
$ch = Curl_init ($url);
curl_setopt ($ch, curlopt_writefunction, ' MyFunc ');
$result = curl_exec ($ch);
Curl_close ($ch);
function MyFunc ($ch, $data) {
$bytes = strlen ($data);
Process data return
$bytes;
}

However, there is a problem here. For a chunk, the callback function may be called more than once, and each time it is about 16k of data. This is clearly not what we would like to get. Because a chunk of Icomet is terminated with "\ n", the callback function can be buffered.

function MyFunc ($ch, $data) {
$bytes = strlen ($data);
static $buf = ';
$buf. = $data;
while (1) {
$pos = Strpos ($buf, "\ n");
if ($pos = = False) {break
;
}
$data = substr ($buf, 0, $pos + 1);
$buf = substr ($buf, $pos + 1);
Processing Data
}
}

Below to introduce the next chunked PHP use Fsockopen read segmented data (transfer-encoding:chunked)

Using Fsockopen to read data has encountered a magical problem, as follows:

Read Address: Http://blog.maxthon.cn/?feed=rss2

Read code:

<?php
$fp = Fsockopen ("blog.maxthon.cn", $errno, $errstr,);
if (! $fp) {
echo $errstr ($errno) <br/>\n ";
} else {
$out =" Get/?feed=rss2 http/1.1\r\n ";
$out. = "host:blog.maxthon.cn\r\n";
$out. = "connection:close\r\n\r\n";
Fwrite ($fp, $out);
while (!feof ($fp)) {
echo fgets ($FP, 128);
}
Fclose ($FP);
>

Return HTTP content:

Date:mon 10:16:13 GMT
server:apache/2.2.8 (Unix) mod_ssl/2.2.8 openssl/0.9.8b php/5.2.6 x-powered-
by:php/5.2.6
x-pingback:http://blog.maxthon.cn/xmlrpc.php
last-modified:wed, 03:13:41 GMT
ETag: "8f16b619f32188bde3bc008a60c2cc11"
keep-alive:timeout=15, max=120
connection:keep-alive
transfer-encoding:chunked
content-type:text/xml charset=utf-8
22de
<?xml version= "1.0" Encoding = "UTF-8"?>
<rss version= "2.0"
<description><![ cdata[2009 year December 31
1711
..... 1fe8
]]></description>
<content:encoded><![ cdata[<p>2009 year December 31 <br/>
1711</p>

Note that the 4 characters above are marked red, they appear once every other time, but there is no such thing as the data retrieved by other methods such as curl,file_get_contents. to other sites to crawl, but only a few of the site will appear in this case, the multi-party search without solution, I accidentally saw the above return header has such a statement: transfer-encoding:chunked, and the common content-lenght field is not. The general meaning of this statement is that the transmission is encoded in a segmented manner.

Search the keyword on google and find an explanation for the statement on Wikipedia (since there is no Chinese version, I can only translate it by myself):

Chunked Transfer Encoding is a mechanism this allows HTTP messages to be split in several parts. This can is applied to both HTTP requests (from client to server) and HTTP responses (from server to client)

Block-transfer coding is a mechanism that allows HTTP messages to be transferred into several parts. Applies both to HTTP requests (from client to server) and HTTP responses (from server to client)

For example, let-us consider the way in which of HTTP server may transmit data to a client application (usually a Web brow SER). Normally, data delivered in HTTP responses be sent in one piece, whose length are indicated by the Content-length header fi Eld. The length of the data is important, because the client needs to know where the response ends and any following response s Tarts. With chunked encoding, however, the data are broken up to a series of blocks of data and transmitted in one or more Chun KS "So," a server may start sending data before it knows the final size of the content that it ' s sending. Often, the size of these blocks are the same, but this isn't always the case.

For example, let's consider what the HTTP server can do with data transfer to a client application (usually a Web browser). Typically, when HTTP response data is sent to the client in an entire block, the length of the data is represented by the Content-length header field. The length of the data is important because the customer needs to know where the response ends and when the subsequent response starts. Using chunked encoding, however, the data is split into a series of blocks and one or more forwarded "blocks", so the server can start sending data before it knows the length of the content. Typically, the size of these blocks is the same, but not absolute.

After the general meaning of the understanding, we look at the example:

The chunked code is concatenated with several chunk, ending with a chunk marked with a length of 0. Each chunk is divided into head and body two parts, the head content specifies the total number of characters of the next body (16 numbers) and the quantity unit (generally does not write), the body part is the actual content of the specified length, the two parts are separated by carriage return Line (CRLF). In the last chunk of length 0, the content is called footer, and is some additional header information (which can often be ignored directly). The specific chunk encoding format is as follows:

Coded response content:

http/1.1 OK
Content-type:text/plain
Transfer-encoding:chunked

This is the first piece of data

1 A

And this is the second piece of data.

Decoded data:

This is the first paragraph, and then this is the second piece of data

The situation is clear, so how do we decode the encoded data?

In the comments below in the PHP official manual fsockopen function, many people have come up with solutions

Method 1.

<?php
function Unchunk ($result) {return
preg_replace_callback (
'/(?:(?: \ r\n|\n) |^) ([0-9a-f]+) (?: \ r\n|\n) {1,2} (. *?) '.
' ((?:\ r\n|\n) (?: [0-9a-f]+ (?: \ r\n|\n) |$)/si ',
create_function (
' $matches ',
' return Hexdec ($matches [1]) = = strlen ($matches [2])? $ MATCHES[2]: $matches [0]; '
),
$result
);
}

Method Two.

function UnchunkHttp11 ($data) {
$fp = 0;
$outData = "";
while ($fp < strlen ($data)) {
$rawnum = substr ($data, $fp, Strpos (substr ($data, $fp), "\ r \ n") + 2);
$num = Hexdec (Trim ($rawnum));
$FP + + strlen ($rawnum);
$chunk = substr ($data, $fp, $num);
$outData. = $chunk;
$FP + + strlen ($chunk);
}
return $outData;
}

Note: The arguments for both functions are returned HTTP raw data (including headers)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More