http-php How Curl uses the head protocol to get information such as the size of a resource

Source: Internet
Author: User
My program allows users to fill in the URL from other sites to fetch resources, but before crawling I need to know the size of resources, otherwise the resource too much time consuming too long will also occupy unnecessary bandwidth. I found out that HTTP has head in it. This protocol is to get the HTTP header information for only one resource, then curlHow to get only HTTP headers without downloading all the body?

There content-length is all the HTTP header information must be there, because I only have this method to get the size of the resources. Without this information, I would like to use an alternative, is to set the maximum length of the Curl download resource, if the connection is exceeded, and then error. Is there an option to achieve such effects in curl?

Finally ask, how does the server support the head protocol?

Reply content:

My program allows users to fill in the URL from other sites to fetch resources, but before crawling I need to know the size of resources, otherwise the resource too much time consuming too long will also occupy unnecessary bandwidth. I found out that HTTP has head this protocol, is to get only one resource HTTP header information, then how to curl only get HTTP headers and not download all the body?

There content-length is all the HTTP header information must be there, because I only have this method to get the size of the resources. Without this information, I would like to use an alternative, is to set the maximum length of the Curl download resource, if the connection is exceeded, and then error. Is there an option to achieve such effects in curl?

Finally ask, how does the server support the head protocol?

Actually, Curl has a long HEAD -overdue support for the agreement.

Just add such a line to your code and the Head protocol curl_setopt ($ch, Curlopt_nobody, True) is automatically selected;

If you want to read Content-Length , then only need to be in the curl_exec rear

Read the Content-length value in the header $size = Curl_getinfo ($ch, curlinfo_content_length_download);

It should be noted that HEAD although the protocol is supported by most servers, it is not said that all the servers are supported, and some servers in order to prevent crawling, in the settings to kill the protocol. and is Content-Length not a required field, you should do if you have this value, and exceed the maximum value, you can return an error, if there is no such value, or do not exceed the maximum value, you must be the size of the downloaded content to judge.

As far as you say the maximum resource download length, I have not seen this setting, but there is a better solution to this problem, that is to use CURLOPT_HEADERFUNCTION and CURLOPT_WRITEFUNCTION two callbacks, then only need a single request to complete all the judgment, and can be broken at any time

$size = 0; $max _size = 123456;curl_setopt ($ch, curlopt_headerfunction, function ($ch, $STR) {//The first parameter is a curl resource, The second parameter is the independent header!    of each line List ($name, $value) = Array_map (' Trim ', explode (': ', $str, 2));    $name = Strtolower ($name);        Determine the size of    if (' content-length ' = = $name) {    if ($value > $max _size) {return        0;//will break Read}}}    ); For no content-length, we read one side to Judge Curl_setopt ($ch, curlopt_writefunction, function ($ch, $STR) use (& $size) {$len = Strlen ($STR);    $size + = $len;        if ($size > $max _size) {    return 0;//interrupted read    }        return $len;});

Why do you use curl? Just use Fsockopen to send a head over there and ask for it.

However, the head request does not necessarily return the size of the resource, which does not seem to be guaranteed.

curl_setopt ($curl, Curlopt_header, true);

The results returned by Curl_exec also include the HTTP response header, where the Content-length value can be extracted.

http/1.1 okserver:apachecontent-type:text/htmlcontent-encoding:gzipcontent-length:26395

This length value is unreliable, and the server backend script can modify the value arbitrarily.

Setting the maximum fetch size is OK. The remote server is not trustworthy, and the given content-length is not necessarily the true size. To prevent abuse, you also have to add size restrictions.

At the same time you can make an additional judgment, such as a domain name often return content-length and the actual inconsistent content, give it a relatively low reputation. If a user submits a resource fetch requirement for a reputation low domain name, it can be deferred or low-priority processed.

Plus the maximum execution time control is OK, curl is able to control the time-out.

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.