In libcurl-php's curl, can I specify the number of bytes to be obtained when retrieving html data on the page?

Source: Internet
Author: User
Fopen + fread ($ fp, number of bytes read) originally used by RT) if I want to change the data obtained from SAE to curl, I only need to match the title value and delete the first 800 bytes of the file. There are many curl parameters and I don't know which one to set. After all, retrieving the entire html file will consume a lot of time... RT
Originally used fopen + fread ($ fp, number of bytes read) to obtain data. If SAE is not supported, you want to change it to curl.
I only need to match the title value to get the first 800 bytes of the file. There are many curl parameters and I don't know which one to set.
After all, retrieving the entire html file consumes a lot of time, as long as the first 800 bytes are used. This will save some time. The time I tested with microtime is not much different, but it is still quite different.

Reply content:

RT
Originally used fopen + fread ($ fp, number of bytes read) to obtain data. If SAE is not supported, you want to change it to curl.
I only need to match the title value to get the first 800 bytes of the file. There are many curl parameters and I don't know which one to set.
After all, retrieving the entire html file consumes a lot of time, as long as the first 800 bytes are used. This will save some time. The time I tested with microtime is not much different, but it is still quite different.

CURL has a range option, measured in bytes, which can be set as follows:

curl_setopt($ch, CURLOPT_RANGE, '0-799');

However, this does not necessarily work. It only sends a request header. The specific response is determined by the sender. If the sender supports multipart return, the response takes effect. Otherwise, the response is complete. Stream can also be used to send the range header information, so the result should be the same:

$context = stream_context_create(array('http' => array ('header'=> 'Range: bytes=0-799')));$data = file_get_contents("http://example.com/file.html", FALSE, $context);

Rfc document about range header: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35

This is acceptable. However, each read may exceed the value you specified. Just judge it.


  $ Url, CURLOPT_WRITEFUNCTION => 'shortepartial ',); curl_exec ($ ch); curl_close ($ ch); function shortepartial ($ ch, $ chunk) {global $ data; $ data. = $ chunk; $ len = strlen ($ chunk); echo 'had receive ', $ len, 'bytes', PHP_EOL; // judge each read. If the total number is greater than 1000, it will no longer read down. if (strlen ($ data) >=1000) {return-1;} // The returned value indicates whether the CURL is sufficient. Do you want to read it again. return $ len;} echo $ data;

According to what you just want to get the pagetitleRequirements, usefile_get_contentsIs a function more suitable?

$content = file_get_contents('http://www.baidu.com',  false, null, -1, 800);if(mb_detect_encoding($content) == 'GB2312')    $content = iconv('GB2312', 'UTF-8', $content);preg_match("/.*<\/title>/", $content, $title);</code></pre>

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.