PHP-Curl used (to) to load from: http://lelong.iteye.com/blog/538645
This article describes the php_curl library and how to better use php_curl.
Introduction
You may encounter the following problem in your PHP script code: how can I get content from other sites? Here are several solutions. the simplest thing is to use the fopen () function in php, but the fopen function does not have enough parameters to use. for example, if you want to build a "web crawler ", to define the crawler client description (IE, firefox), you can obtain the content through different request methods, such as POST and GET. these requirements cannot be implemented using the fopen () function.
To solve the problem we raised above, we can use the PHP Extension Library-Curl. this extension Library is usually in the installation package by default, and you can obtain the content of other sites, you can also do something else.
Note: the two codes must be supported by the php_curl Extension Library. View phpinfo (). if curl support is enabled, the curl Library is supported.
1. enable curl library support for PHP in Windows:
Open php. ini and remove the; sign before extension = php_curl.dll.
2. enable curl library support for PHP in Linux:
When compiling PHP, add? After./configure? With-curl
In this article, let's take a look at how to use the curl library and its other functions. However, next we will start with the most basic usage.
Basic usage:
Step 1: Use the curl_init () function to create a new curl session. the code is as follows:
// Create a new curl resource
$ Ch = curl_init ();
?>
We have successfully created a curl session. if you need to obtain the content of a URL, pass a URL to the curl_setopt () function in the next step. the code is as follows:
// Set URL and other appropriate options
Curl_setopt ($ ch, CURLOPT_URL, "http://www.google.com /");
?>
After finishing the previous step, the preparation of curl is complete, curl will get the content of the URL site and print it out. Code:
// Grab URL and pass it to the browser
Curl_exec ($ ch );
?>
Finally, close the current curl session
// Close curl resource, and free up system resources
Curl_close ($ ch );
?>
Let's take a look at the completed instance code:
// Create a new curl resource
$ Ch = curl_init ();
// Set URL and other appropriate options
Curl_setopt ($ ch, CURLOPT_URL, "http://www.google.nl /");
// Grab URL and pass it to the browser
Curl_exec ($ ch );
// Close curl resource, and free up system resources
Curl_close ($ ch );
?>
(View online demo)
We have just obtained the content of another site and then automatically output it to the browser. Do we have other ways to organize the information and control the output content? No problem at all. in the curl_setopt () function parameter, if you want to obtain the content but do not output it, use the CURLOPT_RETURNTRANSFER parameter and set it to a non-0 value/true !, Complete code can be found:
// Create a new curl resource
$ Ch = curl_init ();
// Set URL and other appropriate options
Curl_setopt ($ ch, CURLOPT_URL, "http://www.google.nl /");
Curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, true );
// Grab URL, and return output
$ Output = curl_exec ($ ch );
// Close curl resource, and free up system resources
Curl_close ($ ch );
// Replace 'Google 'with 'phpit'
$ Output = str_replace ('Google ', 'phpit', $ output );
// Print output
Echo $ output;
?>
(View online demo)
In the above two instances, you may notice that different results can be obtained by setting different parameters of the curl_setopt () function, which is exactly why curl is powerful, let's take a look at the meanings of these parameters.
CURL-related options:
If you have read the curl_setopt () function in the php manual, you can note that the following long parameter list cannot be described one by one. For more information, see The PHP Manual, here we only introduce common and some parameters.
The first interesting parameter is CURLOPT_FOLLOWLOCATION. when you set this parameter to true, curl will obtain the redirection path in a deeper level based on any redirection command. for example: when you try to get a PHP page, there is a jump code in this PHP page , Curl will get the content from http: // new_url, instead of returning the jump code. The complete code is as follows:
// Create a new curl resource
$ Ch = curl_init ();
// Set URL and other appropriate options
Curl_setopt ($ ch, CURLOPT_URL, "http://www.google.com /");
Curl_setopt ($ ch, CURLOPT_FOLLOWLOCATION, true );
// Grab URL, and print
Curl_exec ($ ch );
?>
(View online demo ),
If Google sends a redirection request, the preceding example continues to retrieve the content based on the redirected URL. two options related to this parameter are CURLOPT_MAXREDIRS and CURLOPT_AUTOREFERER.
The parameter CURLOPT_MAXREDIRS option allows you to define the maximum number of redirect requests. If this parameter is exceeded, the system will no longer obtain its content. If CURLOPT_AUTOREFERER is set to true, the curl will automatically add a Referer header in each jump link. it may not be very important, but it is very useful in some cases.
The parameter described in the next step is CURLOPT_POST, which is a very useful feature because it allows you to do this for POST requests instead of GET requests, which actually means you can submit
Other forms of pages do not need to be filled in the form. The following example shows what I mean:
// Create a new curl resource
$ Ch = curl_init ();
// Set URL and other appropriate options
Curl_setopt ($ ch, CURLOPT_URL, "http: // projects/phpit/content/using % 20 curl % 20php/demos/handle_form.php ");
// Do a POST
$ Data = array ('name' => 'Dennis ', 'surname' => 'pallett ');
Curl_setopt ($ ch, CURLOPT_POST, true );
Curl_setopt ($ ch, CURLOPT_POSTFIELDS, $ data );
// Grab URL, and print
Curl_exec ($ ch );
?>
(View Live Demo)
And the handle_form.php file:
Echo 'form variables I got Ed :';
Echo'
’;
print_r ($_POST);
echo ‘
';
?>
As you can see, this makes it really easy to submit forms, which is a great way to test all your forms without filling them at all times.
The CURLOPT_CONNECTTIMEOUT parameter is usually used to set the curl request link time. This is a very important option. if you set this time period too short, the curl request may fail.
However, if you set it too long, the PHP script may die. One option related to this parameter is CURLOPT_TIMEOUT, which is used to set the time required for curl execution. If you set this small value, it may be incomplete on the downloaded webpage because it takes some time for them to download.
The last option is CURLOPT_USERAGENT, which allows you to customize the client name of the request, for example, webspilder or IE6.0. the sample code is as follows:
// Create a new curl resource
$ Ch = curl_init ();
// Set URL and other appropriate options
Curl_setopt ($ ch, CURLOPT_URL, "http://www.useragent.org /");
Curl_setopt ($ ch, CURLOPT_USERAGENT, 'My M web spider/100 ′);
Curl_setopt ($ ch, CURLOPT_FOLLOWLOCATION, true );
// Grab URL, and print
Curl_exec ($ ch );
?>
(View Live Demo)
Now we have introduced the most interesting parameter. next we will introduce a curl_getinfo () function to see what it can do for us.
Obtain the page information:
The curl_getinfo () function allows us to obtain various information on the accept page. you can edit this information by setting the second parameter of the option, and you can also pass an array. As shown in the following example:
// Create a new curl resource
$ Ch = curl_init ();
// Set URL and other appropriate options
Curl_setopt ($ ch, CURLOPT_URL, "http://www.google.com ");
Curl_setopt ($ ch, CURLOPT_FOLLOWLOCATION, true );
Curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, true );
Curl_setopt ($ ch, CURLOPT_FILETIME, true );
// Grab URL
$ Output = curl_exec ($ ch );
// Print info
Echo'
’;
print_r (curl_getinfo($ch));
echo ‘
';
?>
(View Live Demo)
Most of the returned information is the request itself, such as the time it takes for the request, the header file information returned, and some page information, such as the size of the page content and the last modification time.
Those are all about the curl_getinfo () function. now let's take a look at its actual usage.
Actual use:
The first purpose of the curl Library is to check whether a URL page exists. we can check the code returned by the URL request to determine whether 404 indicates that this page does not exist. let's look at some examples:
// Create a new curl resource
$ Ch = curl_init ();
// Set URL and other appropriate options
Curl_setopt ($ ch, CURLOPT_URL, "http://www.google.com/does/not/exist ");
Curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, true );
// Grab URL
$ Output = curl_exec ($ ch );
// Get response code
$ Response_code = curl_getinfo ($ ch, CURLINFO_HTTP_CODE );
// Not found?
If ($ response_code = '000000 ′){
Echo 'Page doesn \'t exist ';
} Else {
Echo $ output;
}
?>
(View Live Demo)
Other users may create an automatic checker to verify whether the page for each request exists.
We can use the curl library to write web spider similar to google, or other web spider. This article is not about how to write a web spider, so we didn't talk about any details about web spider. However, in the future, PHPit will introduce how to use curl to construct a web spider.
Conclusion:
In this article, I have demonstrated how to use the curl Library in php and most of its options.
For the most basic task, you only want to get a web page. you may not need the CURL Library. However, if you want to do anything more advanced, you may want to use the curl Library.
In the near future, I will tell you how to build your own web spider, similar to Google's web spider, so stay tuned to phpit.