CURL of PHP functions

Source: Internet
Author: User
Tags curl options
PHP function CURL cURL is a tool that uses URL syntax to transmit files and data. it supports many protocols, such as HTTP, FTP, and TELNET. The best thing is that PHP also supports the cURL Library. This article describes some advanced features of cURL and how to use it in PHP.

Why use cURL?

Yes. we can use other methods to obtain the webpage content. Most of the time, I want to be lazy and use simple PHP functions directly:

Reference content is as follows:

$ Content = file_get_contents ("http://www.nettuts.com ");
// Or
$ Lines = file ("http://www.nettuts.com ");
// Or
Readfile (http://www.nettuts.com );

However, this approach lacks flexibility and effective error handling. What's more, you cannot use it to complete some difficult tasks ?? For example, coockies, verification, form submission, and file upload.

Reference:
CURL is a powerful library that supports many different protocols and options and provides URL request-related details.

Basic structure

Before learning more complex functions, let's take a look at the basic steps for creating cURL requests in PHP:

Initialize and set the variable execution and obtain the result to release the cURL handle.

Reference content is as follows:

// 1. initialization
$ Ch = curl_init ();
// 2. set options, including URL
Curl_setopt ($ ch, CURLOPT_URL, "http://www.nettuts.com ");
Curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1 );
Curl_setopt ($ ch, CURLOPT_HEADER, 0 );
// 3. execute and obtain the HTML document content
$ Output = curl_exec ($ ch );
// 4. release the curl handle
Curl_close ($ ch );

The second step (that is, curl_setopt () is the most important, and all the mysteries are here. There is a long string of cURL parameters that can be set. they can specify the details of URL requests. It may be difficult to read and understand all at once, so today we will only try out the more common and useful options.

Check error

You can add a statement to check for errors (although this is not required ):

Reference content is as follows:

//...
$ Output = curl_exec ($ ch );
If ($ output = FALSE ){
Echo "cURL Error:". curl_error ($ ch );
}
//...

Note that we use "= FALSE" instead of "= FALSE" for comparison ". Because we need to distinguish between null output and boolean value FALSE, the latter is a real error.

Obtain information

This is another optional setting item, which can obtain the relevant information of this request after the cURL is executed:

Reference content is as follows:

//...
Curl_exec ($ ch );
$ Info = curl_getinfo ($ ch );
Echo 'get '. $ info ['URL'].' time consumed '. $ info ['total _ time']. 'second ';
//...

The returned array contains the following information:

"Url" // resource network address "content_type" // content encoding "http_code" // HTTP status code "header_size" // header size "request_size" // request size" filetime "// file creation time" ssl_verify_result "// SSL verification result" redirect_count "// jump technology" total_time "// total time consumed" namelookup_time "// DNS query time consumed" connect_time" // waiting for connection duration "pretransfer_time" // pre-transmission preparation duration "size_upload" // size of uploaded data "size_download" // size of downloaded data "speed_download" // download speed speed_upload "// upload speed" download_content_length "// length of the downloaded content" upload_content_length "// length of the uploaded content" starttransfer_time "// start transmission time" redirect_time "// redirect time consumption

Browser-based redirection

In the first example, we will provide a piece of code to detect whether the server has browser-based redirection. For example, some websites redirect webpages based on whether the browser is a mobile phone or even the country from which the user is from.

We use the CURLOPT_HTTPHEADER option to set the HTTP request header (http headers) that we send, including the user agent information and the default language. Then let's see if these websites will redirect us to different URLs.

Reference content is as follows:

// URL used for testing
$ Urls = array (
"Http://www.cnn.com ",
"Http://www.mozilla.com ",
Http://www.facebook.com"
);
// Browser information for testing
$ Browsers = array (
"Standard" => array (
"User_agent" => "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv: 1.9.1.6) Gecko/20091201 Firefox/3.5.6 (. net clr 3.5.30729 )",
"Language" => "en-us, en; q = 0.5"
),
"Iphone" => array (
"User_agent" => "Mozilla/5.0 (iPhone; U; CPU like Mac OS X; en) AppleWebKit/420 + (KHTML, like Gecko) version/3.0 Mobile/1A537a Safari/419.3 ",
"Language" => "en"
),
"French" => array (
"User_agent" => "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; GTB6;. net clr 2.0.50727 )",
"Language" => "fr, fr-FR; q = 0.5"
)
);
Foreach ($ urls as $ url ){
Echo "URL: $ url \ n ";
Foreach ($ browsers as $ test_name => $ browser ){
$ Ch = curl_init ();
// Set the url
Curl_setopt ($ ch, CURLOPT_URL, $ url );
// Set the specific header of the browser
Curl_setopt ($ ch, CURLOPT_HTTPHEADER, array (
"User-Agent: {$ browser ['User _ agent']}",
"Accept-Language: {$ browser ['language']}"
));
// We do not need the page content
Curl_setopt ($ ch, CURLOPT_NOBODY, 1 );
// Only the HTTP header is returned
Curl_setopt ($ ch, CURLOPT_HEADER, 1 );
// Return the result instead of outputting it
Curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1 );
$ Output = curl_exec ($ ch );
Curl_close ($ ch );
// Is there any redirected HTTP header information?
If (preg_match ("! Location :(.*)! ", $ Output, $ matches )){
Echo "$ test_name: redirects to $ matches [1] \ n ";
} Else {
Echo "$ test_name: no redirection \ n ";
}
}
Echo "\ n ";
}

First, we create a set of URLs to be tested, and then specify a set of browser information to be tested. Finally, the possible situation of URL matching and browser matching is tested cyclically.

Because the cURL option is specified, only the HTTP header information (stored in $ output) is returned ). Using a simple regular expression, we can check whether the header information contains the "Location.

The following results should be returned when you run this code:

Use the POST method to send data

When a GET request is initiated, data can be transmitted to a URL through the query string. For example, when searching in google, the search key is part of the query string of the URL:

Http://www.google.com/search? Q = nettuts

In this case, you may not need cURL to simulate it. You can get the same result by throwing this URL to "file_get_contents.

However, some HTML forms are submitted using the POST method. When such a form is submitted, the data is sent through the HTTP request body instead of the query string. For example, when using the CodeIgniter Forum form, no matter what keyword you enter, it is always POST to the following page:

Http://codeigniter.com/forums/do_search/

You can use PHP scripts to simulate such URL requests. First, create a new file that accepts and displays POST data. we name it post_output.php:

Print_r ($ _ POST );

Next, write a PHP script to execute the cURL request:

Reference content is as follows:

$ Url = "http: // localhost/post_output.php ";
$ Post_data = array (
"Foo" => "bar ",
"Query" => "Nettuts ",
"Action" => "Submit"
);
$ Ch = curl_init ();
Curl_setopt ($ ch, CURLOPT_URL, $ url );
Curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1 );
// We are posting data!
Curl_setopt ($ ch, CURLOPT_POST, 1 );
// Add the post variable
Curl_setopt ($ ch, CURLOPT_POSTFIELDS, $ post_data );
$ Output = curl_exec ($ ch );
Curl_close ($ ch );
Echo $ output;

After the code is executed, the following results are returned:

This script sends a POST request to post_output.php. The $ _ POST variable on this page is returned. we captured this output using cURL.

File Upload

The uploaded file is very similar to the previous POST. Because all file upload forms are submitted through the POST method.

First, create a page for receiving files, named upload_output.php:

Print_r ($ _ FILES );

The following is a script for executing a file Upload task:

Reference content is as follows:

$ Url = "http: // localhost/upload_output.php ";
$ Post_data = array (
"Foo" => "bar ",
// Address of the local file to be uploaded
"Upload" => "@ C:/wamp/www/test.zip"
);
$ Ch = curl_init ();
Curl_setopt ($ ch, CURLOPT_URL, $ url );
Curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1 );
Curl_setopt ($ ch, CURLOPT_POST, 1 );
Curl_setopt ($ ch, CURLOPT_POSTFIELDS, $ post_data );
$ Output = curl_exec ($ ch );
Curl_close ($ ch );
Echo $ output;

If you need to upload a file, you only need to pass the file path like a post variable, but remember to add the @ symbol before it. Execute this script and you will get the following output:

CURL batch processing (multi cURL)

Does cURL have an advanced feature ?? Handle ). This feature allows you to open multiple URL connections simultaneously or asynchronously.

The following is the sample code from php.net:

Reference content is as follows:

// Create two cURL resources
$ Response = curl_init ();
$ Ch2 = curl_init ();
// Specify the URL and appropriate parameters
Curl_setopt ($ scheme, CURLOPT_URL, "http://lxr.php.net /");
Curl_setopt ($ scheme, CURLOPT_HEADER, 0 );
Curl_setopt ($ ch2, CURLOPT_URL, "http://www.php.net /");
Curl_setopt ($ ch2, CURLOPT_HEADER, 0 );
// Create a cURL batch handle
$ Mh = curl_multi_init ();
// Add the first two resource handles
Curl_multi_add_handle ($ mh, $ handle );
Curl_multi_add_handle ($ mh, $ ch2 );
// Predefine a state variable
$ Active = null;
// Execute batch processing
Do {
$ Mrc = curl_multi_exec ($ mh, $ active );
} While ($ mrc = CURLM_CALL_MULTI_PERFORM );
While ($ active & $ mrc = CURLM_ OK ){
If (curl_multi_select ($ mh )! =-1 ){
Do {
$ Mrc = curl_multi_exec ($ mh, $ active );
} While ($ mrc = CURLM_CALL_MULTI_PERFORM );
}
}
// Close each handle
Curl_multi_remove_handle ($ mh, $ handle );
Curl_multi_remove_handle ($ mh, $ ch2 );
Curl_multi_close ($ mh );

Here, you need to open multiple cURL handles and assign them to a batch processing handle. Then you just need to wait for it to be executed in a while loop.

There are two main loops in this example. The first do-while loop repeatedly calls curl_multi_exec (). This function is non-blocking, but will be executed as little as possible. It returns a status value. as long as the value is equal to the constant CURLM_CALL_MULTI_PERFORM, it indicates that there is still some urgent work to be done (for example, sending the http header information of the corresponding URL ). That is to say, we need to call this function continuously until the return value changes.

The next while loop continues only when the $ active variable is true. This variable was previously passed to curl_multi_exec () as the second parameter, which indicates that as long as there is any active connection in the batch handle. Next, we call curl_multi_select (), which is "blocked" before an active connection (for example, server response reception) occurs. After this function is successfully executed, we will go to another do-while loop and continue the next URL.

Let's take a look at how to use the feature:

WordPress connection checker

Imagine that you have a blog with a large number of articles that contain a large number of external website links. After a period of time, due to such reasons, a considerable number of these links are invalid. Either it is being harmonious, or the whole site is being attacked by Kung Fu...

Next we will create a script to analyze all these links, find websites/webpages that cannot be opened or 404, and generate a report.

Please note that the following is not a truly available WordPress plug-in. it is just a script with independent functions. it is only for demonstration. thank you.

Okay. let's get started. First, read all these links from the database:

Reference content is as follows:

// CONFIG
$ Db_host = 'localhost ';
$ Db_user = 'root ';
$ Db_pass = '';
$ Db_name = 'wordpress ';
$ Excluded_domains = array (
'Localhost', 'www .mydomain.com ');
$ Max_connections = 10;
// Initialize some variables
$ Url_list = array ();
$ Working_urls = array ();
$ Dead_urls = array ();
$ Not_found_urls = array ();
$ Active = null;
// Connect to MySQL
If (! Mysql_connect ($ db_host, $ db_user, $ db_pass )){
Die ('could not connect: '. mysql_error ());
}
If (! Mysql_select_db ($ db_name )){
Die ('could not select db: '. mysql_error ());
}
// Find all articles with links
$ Q = "SELECT post_content FROM wp_posts
WHERE post_content LIKE '% href = %'
AND post_status = 'Publish'
AND post_type = 'post '";
$ R = mysql_query ($ q) or die (mysql_error ());
While ($ d = mysql_fetch_assoc ($ r )){
// Use the regular expression to match the link
If (preg_match_all ("! Href = \"(.*?) \"! ", $ D ['post _ content'], $ matches )){
Foreach ($ matches [1] as $ url ){
// Exclude some domains
$ Tmp = parse_url ($ url );
If (in_array ($ tmp ['host'], $ excluded_domains )){
Continue;
}
// Store the url
$ Url_list [] = $ url;
}
}
}
// Remove duplicate links
$ Url_list = array_values (array_unique ($ url_list ));
If (! $ Url_list ){
Die ('No URL to check ');
}

We first configure the database, a series of domain names to be excluded ($ excluded_domains), and the maximum number of concurrent connections ($ max_connections ). Then, connect to the database, obtain the articles and links, and collect them into an array ($ url_list ).

The following code is a bit complicated, so I will explain it in a small step:

Reference content is as follows:

// 1. batch processor
$ Mh = curl_multi_init ();
// 2. add URLs to be processed in batches
For ($ I = 0; $ I <$ max_connections; $ I ++ ){
Add_url_to_multi_handle ($ mh, $ url_list );
}
// 3. initial processing
Do {
$ Mrc = curl_multi_exec ($ mh, $ active );
} While ($ mrc = CURLM_CALL_MULTI_PERFORM );
// 4. main loop
While ($ active & $ mrc = CURLM_ OK ){
// 5. there is an active connection
If (curl_multi_select ($ mh )! =-1 ){
// 6. work
Do {
$ Mrc = curl_multi_exec ($ mh, $ active );
} While ($ mrc = CURLM_CALL_MULTI_PERFORM );
// 7. Is there any information?
If ($ mhinfo = curl_multi_info_read ($ mh )){
// Indicates that the connection ends normally.
// 8. obtain information from the curl handle
$ Chinfo = curl_getinfo ($ mhinfo ['handle']);
// 9. do you have a dead link?
If (! $ Chinfo ['http _ Code']) {
$ Dead_urls [] = $ chinfo ['URL'];
// 10. 404?
} Else if ($ chinfo ['http _ Code'] = 404 ){
$ Not_found_urls [] = $ chinfo ['URL'];
// 11. Available
} Else {
$ Working_urls [] = $ chinfo ['URL'];
}
// 12. remove the handle
Curl_multi_remove_handle ($ mh, $ mhinfo ['handle']);
Curl_close ($ mhinfo ['handle']);
// 13. add a new URL to work
If (add_url_to_multi_handle ($ mh, $ url_list )){
Do {
$ Mrc = curl_multi_exec ($ mh, $ active );
} While ($ mrc = CURLM_CALL_MULTI_PERFORM );
}
}
}
}
// 14. finished
Curl_multi_close ($ mh );
Echo "= Dead URLs = \ n ";
Echo implode ("\ n", $ dead_urls). "\ n ";
Echo "= 404 URLs = \ n ";
Echo implode ("\ n", $ not_found_urls). "\ n ";
Echo "= Working URLs = \ n ";
Echo implode ("\ n", $ working_urls );
// 15. add a url to the batch processor
Function add_url_to_multi_handle ($ mh, $ url_list ){
Static $ index = 0;
// It is useless if the remaining url is left.
If ($ url_list [$ index]) {
// Create a curl handle
$ Ch = curl_init ();
// Configure the url
Curl_setopt ($ ch, CURLOPT_URL, $ url_list [$ index]);
// You do not want to output the returned content
Curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1 );
// We will go wherever the redirection is.
Curl_setopt ($ ch, CURLOPT_FOLLOWLOCATION, 1 );
// No content body is required, which can save bandwidth and time
Curl_setopt ($ ch, CURLOPT_NOBODY, 1 );
// Add to the batch processor
Curl_multi_add_handle ($ mh, $ ch );
// Dial the counter. The next time you call this function, you can add the next url.
$ Index ++;
Return true;
} Else {
// No new URL needs to be processed
Return false;
}
}

The above code is explained below. The serial number of the list corresponds to the sequential number in the code comment.

Create a new batch processor. Created a multi handle. later we will create a function add_url_to_multi_handle () that adds the URL to the batch processor (). Each time this function is called, a new url is added to a batch processor. At the beginning, we added 10 URLs to the batch processor (this number is determined by $ max_connections ). It is required to run curl_multi_exec () for initialization, as long as it returns CURLM_CALL_MULTI_PERFORM, there is still something to do. To create a connection, it does not wait for the complete URL response. As long as there are active connections in the batch processing, the main cycle will continue. Curl_multi_select () will wait until a URL query generates an active connection. CURL is another task, mainly to obtain response data. Check various information. When a URL request is complete, an array is returned. The returned array contains a cURL handle. We use it to obtain the corresponding information of a single cURL request. If this is a dead link or the request times out, no http status code is returned. If this page cannot be found, status code 404 is returned. In other cases, we all think that this link is available (of course, you can also check the 500 error and so on ...). Remove the cURL handle from this batch, because it has no utilization value, and it is disabled! Well, now you can add another URL. Again, initialization started again... well, it's all done. Disable the batch processor to generate reports. Let's look at the function of adding a new URL to the batch processor. Every time this function is called, the static variable $ index increments once so that we can know how many URLs are not processed.

I ran this script on my blog (test required, some error links were intentionally added). The result is as follows:

Reference content is as follows:

A total of about 40 URLs are checked, which takes less than two seconds. When you need to check a large number of URLs, the effect of worry-free and effort-saving can be imagined! If you open 10 connections at the same time, it will be 10 times faster! In addition, you can use the no-partition feature of cURL batch processing to process a large number of URL requests without blocking your Web scripts.

Other useful cURL options

HTTP authentication

If a URL request requires HTTP-based authentication, you can use the following code:
Copy the content to the clipboard code:

Reference content is as follows:

$ Url = "http://www.somesite.com/members ";
$ Ch = curl_init ();
Curl_setopt ($ ch, CURLOPT_URL, $ url );
Curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1 );
// Send the user name and password
Curl_setopt ($ ch, CURLOPT_USERPWD, "myusername: mypassword ");
// You can allow redirection
Curl_setopt ($ ch, CURLOPT_FOLLOWLOCATION, 1 );
// The following options enable cURL after redirection
// The user name and password can also be sent
Curl_setopt ($ ch, CURLOPT_UNRESTRICTED_AUTH, 1 );
$ Output = curl_exec ($ ch );
Curl_close ($ ch );

FTP Upload

PHP comes with an FTP class library, but you can also use cURL:

Reference content is as follows:

// Open a file pointer
$ File = fopen ("/path/to/file", "r ");
// The url contains most of the required information.
$ Url = "ftp: // username: password@mydomain.com: 21/path/to/new/file ";
$ Ch = curl_init ();
Curl_setopt ($ ch, CURLOPT_URL, $ url );
Curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1 );
// Upload related options
Curl_setopt ($ ch, CURLOPT_UPLOAD, 1 );
Curl_setopt ($ ch, CURLOPT_INFILE, $ fp );
Curl_setopt ($ ch, CURLOPT_INFILESIZE, filesize ("/path/to/file "));
// Whether to enable the ASCII mode (used when uploading text files)
Curl_setopt ($ ch, CURLOPT_FTPASCII, 1 );
$ Output = curl_exec ($ ch );
Curl_close ($ ch );

***

You can use a proxy to initiate a cURL request:

Reference content is as follows:

$ Ch = curl_init ();
Curl_setopt ($ ch, CURLOPT_URL, 'http: // www.example.com ');
Curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1 );
// Specify the proxy address
Curl_setopt ($ ch, CURLOPT_PROXY, '11. 11.11.11: 8080 ');
// Provide the user name and password if needed
Curl_setopt ($ ch, CURLOPT_PROXYUSERPWD, 'User: pass ');
$ Output = curl_exec ($ ch );
Curl_close ($ ch );

Callback function

CURL can call a specified callback function during a URL request. For example, you can use data immediately during content or response download, instead of waiting until the download is complete.

Reference content is as follows:

$ Ch = curl_init ();
Curl_setopt ($ ch, CURLOPT_URL, 'http: // net.tutsplus.com ');
Curl_setopt ($ ch, CURLOPT_WRITEFUNCTION, "progress_function ");
Curl_exec ($ ch );
Curl_close ($ ch );
Function progress_function ($ ch, $ str ){
Echo $ str;
Return strlen ($ str );
}

This callback function must return the length of the string, otherwise this function will not work properly.

In the process of receiving a URL response, this function will be called as long as a packet is received.

Summary

Today, we have learned the powerful functions and flexible scalability of the cURL Library. Hope you like it. Consider cURL for the next URL request!

Original article: PHP-based cURL quick start

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.