http-php How Curl uses the head protocol to get information such as the size of a resource

Last Update:2016-06-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

My program allows users to fill in the URL from other sites to fetch resources, but before crawling I need to know the size of resources, otherwise the resource too much time consuming too long will also occupy unnecessary bandwidth. I found out that HTTP has head in it. This protocol is to get the HTTP header information for only one resource, then curlHow to get only HTTP headers without downloading all the body?

There content-length is all the HTTP header information must be there, because I only have this method to get the size of the resources. Without this information, I would like to use an alternative, is to set the maximum length of the Curl download resource, if the connection is exceeded, and then error. Is there an option to achieve such effects in curl?

Finally ask, how does the server support the head protocol?

Reply content:

My program allows users to fill in the URL from other sites to fetch resources, but before crawling I need to know the size of resources, otherwise the resource too much time consuming too long will also occupy unnecessary bandwidth. I found out that HTTP has head this protocol, is to get only one resource HTTP header information, then how to curl only get HTTP headers and not download all the body?

Finally ask, how does the server support the head protocol?

Actually, Curl has a long HEAD -overdue support for the agreement.

Just add such a line to your code and the Head protocol curl_setopt ($ch, Curlopt_nobody, True) is automatically selected;

If you want to read Content-Length , then only need to be in the curl_exec rear

Read the Content-length value in the header $size = Curl_getinfo ($ch, curlinfo_content_length_download);

It should be noted that HEAD although the protocol is supported by most servers, it is not said that all the servers are supported, and some servers in order to prevent crawling, in the settings to kill the protocol. and is Content-Length not a required field, you should do if you have this value, and exceed the maximum value, you can return an error, if there is no such value, or do not exceed the maximum value, you must be the size of the downloaded content to judge.

As far as you say the maximum resource download length, I have not seen this setting, but there is a better solution to this problem, that is to use CURLOPT_HEADERFUNCTION and CURLOPT_WRITEFUNCTION two callbacks, then only need a single request to complete all the judgment, and can be broken at any time

$size = 0; $max _size = 123456;curl_setopt ($ch, curlopt_headerfunction, function ($ch, $STR) {//The first parameter is a curl resource, The second parameter is the independent header!    of each line List ($name, $value) = Array_map (' Trim ', explode (': ', $str, 2));    $name = Strtolower ($name);        Determine the size of    if (' content-length ' = = $name) {    if ($value > $max _size) {return        0;//will break Read}}}    ); For no content-length, we read one side to Judge Curl_setopt ($ch, curlopt_writefunction, function ($ch, $STR) use (& $size) {$len = Strlen ($STR);    $size + = $len;        if ($size > $max _size) {    return 0;//interrupted read    }        return $len;});

Why do you use curl? Just use Fsockopen to send a head over there and ask for it.

However, the head request does not necessarily return the size of the resource, which does not seem to be guaranteed.

curl_setopt ($curl, Curlopt_header, true);

The results returned by Curl_exec also include the HTTP response header, where the Content-length value can be extracted.

http/1.1 okserver:apachecontent-type:text/htmlcontent-encoding:gzipcontent-length:26395

This length value is unreliable, and the server backend script can modify the value arbitrarily.

Setting the maximum fetch size is OK. The remote server is not trustworthy, and the given content-length is not necessarily the true size. To prevent abuse, you also have to add size restrictions.

At the same time you can make an additional judgment, such as a domain name often return content-length and the actual inconsistent content, give it a relatively low reputation. If a user submits a resource fetch requirement for a reputation low domain name, it can be deferred or low-priority processed.

Plus the maximum execution time control is OK, curl is able to control the time-out.



This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

http-php How Curl uses the head protocol to get information such as the size of a resource

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

http-php How Curl uses the head protocol to get information such as the size of a resource

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support