Original (English) address: http://www.phpit.net/article/using-curl-php copyright notice: Attribution-Non-commercial use-No deduction 2.0
Summary:
In this article, we mainly explain the knowledge of Php_curl library and teach you how to use Php_curl better.
Brief introduction
You may encounter this problem in your writing PHP script code: How can I get content from other sites? Here are a few workarounds, the simplest is to use the fopen () function in PHP, but the fopen function does not have enough parameters to use, such as when you want to build a "web crawler", want to define the crawler's client description (Ie,firefox), through different request to obtain the content, such as post,get; These requirements cannot be implemented with the fopen () function.
To solve the problem we raised above, we can use PHP's extension library-curl, which is usually the default in the installation package, you can get the content of other sites, you can also do something else.
Note: These two pieces of code require support for the Php_curl extension library, view phpinfo (), and if curl support enabled supports the Curl library.
1. PHP in Windows opens the Curl Library support:
Open the php.ini and remove the extension=php_curl.dll before it.
2, Linux under the PHP Open Curl Library Support:
When compiling PHP, add the following./configure after the. With-curl
In this article, let's look at how to use the Curl library and see what else it does, but then we'll start with the most basic usage
Basic usage:
In the first step, we create a new curl session with the function Curl_init (), with the following code:
Create a new Curl resource
$ch = Curl_init ();
?>
We have successfully created a curl session, and if we need to get the contents of a URL, then next step, pass a URL to the curl_setopt () function, code:
Set URL and other appropriate options
curl_setopt ($ch, Curlopt_url, "http://www.google.com/");
?>
After you've done your work, curl is ready to work, and curl will get the content of the URL site and print it out. Code:
Grab URL and pass it to the browser
Curl_exec ($ch);
?>
Finally, close the current Curl session
Close Curl resource, and free up system resources
Curl_close ($ch);
?>
Let's take a look at the completed instance code:
Create a new Curl resource
$ch = Curl_init ();
Set URL and other appropriate options
curl_setopt ($ch, Curlopt_url, "http://www.google.nl/");
Grab URL and pass it to the browser
Curl_exec ($ch);
Close Curl resource, and free up system resources
Curl_close ($ch);
?>
(View online Demo)
We have just put the content of another site, get over the automatic output to the browser later, we have no other way to organize the information obtained, and then control its output content? No problem at all, in the parameters of the curl_setopt () function, if you want to get the content but not the output, use the Curlopt_returntransfer parameter and set it to a value other than 0/true!, see the full code:
Create a new Curl resource
$ch = Curl_init ();
Set URL and other appropriate options
curl_setopt ($ch, Curlopt_url, "http://www.google.nl/");
curl_setopt ($ch, Curlopt_returntransfer, true);
Grab URL, and return output
$output = curl_exec ($ch);
Close Curl resource, and free up system resources
Curl_close ($ch);
Replace ' Google ' with ' phpit '
$output = Str_replace (' Google ', ' phpit ', $output);
Print output
Echo $output;
?>
(View online Demo)
In the above 2 instances, you might notice that by setting the different parameters of the function curl_setopt (), you can get different results, which is why curl is powerful, and let's look at the implications of these parameters.
Related options for Curl:
If you have read the curl_setopt () function in the PHP manual, you can notice that it has a long list of parameters below it, and we can't introduce it all, see the PHP Manual for more details, and here are just a few of the parameters that are commonly used.
The first interesting parameter is curlopt_followlocation, and when you set this parameter to True, curl takes a deeper approach to the steering path based on any redirect command, for example: When you try to get a PHP page, Then there's a jump code in this PHP page , and Curl gets the content from Http://new_url instead of returning the jump code. The complete code is as follows:
Create a new Curl resource
$ch = Curl_init ();
Set URL and other appropriate options
curl_setopt ($ch, Curlopt_url, "http://www.google.com/");
curl_setopt ($ch, curlopt_followlocation, true);
Grab URL, and print
Curl_exec ($ch);
?>
(View online demo),
If Google sends a redirect request, the above example will continue to get the content based on the URL of the jump, and the two options related to this parameter are Curlopt_maxredirs and Curlopt_autoreferer.
The parameter Curlopt_maxredirs option allows you to define the maximum number of times a jump request is exceeded and no longer gets its contents. If Curlopt_autoreferer is set to True, Curl automatically adds Referer header at each jump link, which may not be important, but is useful in a certain case.
The next step is to introduce the parameter curlopt_post, which is a very useful feature because it allows you to do a POST request instead of a GET request, which actually means you can commit
Other forms of the page, do not need to actually fill in the form. The following example shows what I mean:
Create a new Curl resource
$ch = Curl_init ();
Set URL and other appropriate options
curl_setopt ($ch, Curlopt_url, "http://projects/phpit/content/using%20curl%20php/demos/handle_form.php");
Do a POST
$data = Array (' name ' = ' Dennis ', ' surname ' = ' Pallett ');
curl_setopt ($ch, Curlopt_post, true);
curl_setopt ($ch, Curlopt_postfields, $data);
Grab URL, and print
Curl_exec ($ch);
?>
(View Live Demo)
And the handle_form.php file:
Echo '
Form variables I Received:
’;
Echo '
’;
Print_r ($_post);
Echo '
’;
?>
As you can see, this makes it really easy to submit forms, which is a great way to test all your forms without filling them in at all time.
The parameter curlopt_connecttimeout is typically used to set the time that curl attempts to request a link, which is a very important option, and if you set the time too short, it may cause curl requests to fail.
But if you set it too long, the PHP script will probably die. One option associated with this parameter is Curlopt_timeout, which is used to set the time requirement for curl to allow execution. If you set this to a very small value, it might be incomplete on the downloaded page because they will take a while to download.
The last option is Curlopt_useragent, which allows you to customize the client name of the request, such as Webspilder or IE6.0. The sample code is as follows:
Create a new Curl resource
$ch = Curl_init ();
Set URL and other appropriate options
curl_setopt ($ch, Curlopt_url, "http://www.useragent.org/");
curl_setopt ($ch, curlopt_useragent, ' My custom Web spider/0.1′ ');
curl_setopt ($ch, curlopt_followlocation, true);
Grab URL, and print
Curl_exec ($ch);
?>
(View Live Demo)
Now that we've covered one of the most interesting parameters, let's introduce a curl_getinfo () function to see what it can do for us.
Get information about a page:
The function Curl_getinfo () allows us to get various information on the receiving page, you can edit this information by setting the second parameter of the option, you can also pass in the form of an array. Just like the following example:
Create a new Curl resource
$ch = Curl_init ();
Set URL and other appropriate options
curl_setopt ($ch, Curlopt_url, "http://www.google.com");
curl_setopt ($ch, curlopt_followlocation, true);
curl_setopt ($ch, Curlopt_returntransfer, true);
curl_setopt ($ch, Curlopt_filetime, true);
Grab URL
$output = curl_exec ($ch);
Print Info
Echo '
’;
Print_r (Curl_getinfo ($ch));
Echo '
’;
?>
(View Live Demo)
Most of the information returned is the request itself, like this: The time spent on this request, the header file information returned, and of course some page information, such as the size of the page content, the last modified time.
Those are all about the Curl_getinfo () function, now let's look at its actual use.
Actual use:
The first use of the Curl library is to see if a URL page exists, and we can tell by looking at the code returned by this URL, for example, 404 means that the page does not exist, and let's look at some examples:
Create a new Curl resource
$ch = Curl_init ();
Set URL and other appropriate options
curl_setopt ($ch, Curlopt_url, "http://www.google.com/does/not/exist");
curl_setopt ($ch, Curlopt_returntransfer, true);
Grab URL
$output = curl_exec ($ch);
Get Response Code
$response _code = Curl_getinfo ($ch, Curlinfo_http_code);
Not found?
if ($response _code = = ' 404′) {
Echo ' Page doesn\ ' t exist ';
} else {
Echo $output;
}
?>
(View Live Demo)
Other users may be creating an automatic checker to verify that each requested page exists.
We can use the Curl Library to write web spiders like Google (Web spider), or other web spiders. This article is not about how to write a Web spider, so we did not say anything about web spider details, but later in Phpit will introduce the use of curl to construct a web spider.
Conclusion:
In this post I have shown how to use the Curl Library in PHP and most of its options.
For the most basic task, just want to get a webpage, you may not need curl library, but once you want to do anything slightly advanced, you may want to use the Curl Library.
In the near future, I will tell you exactly how to build your own web spider, like Google's web spider, please look forward to phpit.