PHP function Curl

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

CURL is a tool that uses URL syntax to transfer files and data, and supports many protocols, such as HTTP, FTP, Telnet, and so on. Best of all, PHP also supports the CURL library. This article describes some of the advanced features of CURL and how to use it in PHP.

Why do you use CURL?

Yes, we can obtain the content of the webpage by other means. Most of the time, I'm just using simple PHP functions because I want to be lazy:

The following is the referenced content:

$content = file_get_contents ("http://www.nettuts.com");
Or
$lines = File ("http://www.nettuts.com");
Or
ReadFile (http://www.nettuts.com);

However, this approach lacks flexibility and effective error handling. And you can't use it to accomplish some difficult tasks?? such as processing coockies, verification, form submission, file upload and so on.

Reference:
CURL is a powerful library that supports many different protocols, options, and provides a variety of details about URL requests.

Basic structure

Before learning more complex features, take a look at the basic steps to build a curl request in PHP:

Initialize set variable execution and get result release curl handle

The following is the referenced content:

1. Initialization
$ch = Curl_init ();
2. Setting options, including URLs
curl_setopt ($ch, Curlopt_url, "http://www.nettuts.com");
curl_setopt ($ch, Curlopt_returntransfer, 1);
curl_setopt ($ch, Curlopt_header, 0);
3. Execute and get HTML document content
$output = curl_exec ($ch);
4. Releasing the curl handle
Curl_close ($ch);

The second step (that is, curl_setopt ()) is the most important, and all mysticism is here. There is a long list of curl parameters that can be set to specify the details of the URL request. It may be difficult to read them all at once and to understand them, so today we'll just try some of the more commonly used and more useful options.

Check for errors

You can add a section that checks for errors (although this is not required):

The following is the referenced content:

// ...
$output = curl_exec ($ch);
if ($output = = = FALSE) {
echo "CURL Error:". Curl_error ($ch);
}
// ...

Please note that we use "= = = False" instead of "= = false" when comparing. Because we have to distinguish between the null output and the Boolean value false, the latter is the real error.

Get information

This is another optional setting that can be used to obtain information about this request after Curl executes:

The following is the referenced content:

// ...
Curl_exec ($ch);
$info = Curl_getinfo ($ch);
Echo ' Get '. $info [' url ']. ' Time consuming '. $info [' Total_time ']. ' Seconds ';
// ...

The following information is included in the returned array:

"url"//Resource Network Address "Content_Type"//Content encoding "Http_code"//http Status Code "header_size"//header size "request_size"//Requested Size "FILETIME"// File creation Time "Ssl_verify_result"//ssl validation Result "redirect_count"//Jump Technology "total_time"//Total time Spent "Namelookup_time"//dns query time consuming "connect_ Time "//wait for connection duration" Pretransfer_time "//Pre-Transfer preparation time" Size_upload "//Upload data Size" size_download "//Download Data size" speed_download "//download Speed" Speed_upload "//upload Speed" download_content_length "//download content Length" upload_content_length "//upload content Length" starttransfer_time "// Time to start transfer "Redirect_time"//redirect times

Browser-based redirection

In the first example, we will provide a section of code to detect if the server has browser-based redirection. For example, some websites redirect Web pages based on whether they are mobile browsers or even which country the user is from.

We use the Curlopt_httpheader option to set the HTTP request header information (HTTP headers) that we send out, including the user agent information and the default language. Then we'll see if these specific sites will redirect us to different URLs.

The following is the referenced content:

URL for testing
$urls = Array (
"Http://www.cnn.com",
"Http://www.mozilla.com",
"Http://www.facebook.com"
);
Browser Information for testing
$browsers = Array (
"Standard" = = Array (
"User_agent" = "mozilla/5.0" (Windows; U Windows NT 6.1; En-us; rv:1.9.1.6) gecko/20091201 firefox/3.5.6 (. NET CLR 3.5.30729) ",
"Language" = "en-us,en;q=0.5"
),
"iphone" + = Array (
"User_agent" = "mozilla/5.0" (IPhone; U CPU like Mac OS X; EN) applewebkit/420+ (khtml, like Gecko) version/3.0 mobile/1a537a safari/419.3 ",
"Language" = "en"
),
"French" = = Array (
"User_agent" = "mozilla/4.0" (compatible; MSIE 7.0; Windows NT 5.1; GTB6;. NET CLR 2.0.50727) ",
"Language" = "fr,fr-fr;q=0.5"
)
);
foreach ($urls as $url) {
echo "URL: $url \ n";
foreach ($browsers as $test _name = $browser) {
$ch = Curl_init ();
Set URL
curl_setopt ($ch, Curlopt_url, $url);
Set the browser's specific header
curl_setopt ($ch, Curlopt_httpheader, Array (
"User-agent: {$browser [' user_agent ']}",
"Accept-language: {$browser [' Language ']}"
));
Page content We don't need
curl_setopt ($ch, curlopt_nobody, 1);
Simply return to the HTTP header
curl_setopt ($ch, Curlopt_header, 1);
Returns the result instead of outputting it
curl_setopt ($ch, Curlopt_returntransfer, 1);
$output = curl_exec ($ch);
Curl_close ($ch);
Do you have redirect HTTP header information?
if (Preg_match ("! Location: (. *)! ", $output, $matches)) {
echo "$test _name:redirects to $matches [1]\n];
} else {
echo "$test _name:no redirection\n";
}
}
echo "\ n";
}

First, we create a set of URLs that need to be tested, and then specify a set of browser information that needs to be tested. Finally, through the loop test the various URLs and browser matching may be the case.

Because we specify the Curl option, the output returned includes only the HTTP header information (stored in the $output). Using a simple regular, we check that the header message contains the word "Location:".

Running this code should return the following results:

Send data using the Post method

When a GET request is initiated, the data can be passed to a URL through the query string. For example, when searching in Google, the search key is part of the query string for the URL:

Http://www.google.com/search?q=nettuts

In this case you may not need curl to simulate. Throw this URL to "file_get_contents ()" To get the same result.

However, some HTML forms are submitted using the Post method. When this form is submitted, the data is sent over the HTTP request body, not the query string. For example, when using the CodeIgniter forum form, no matter what keyword you enter, always post to the following page:

http://codeigniter.com/forums/do_search/

You can use PHP scripts to simulate this URL request. First, create a new file that can accept and display the post data, and we'll name it post_output.php:

Print_r ($_post);

Next, write a PHP script to perform the Curl request:

The following is the referenced content:

$url = "http://localhost/post_output.php";
$post _data = Array (
"foo" = "Bar",
"Query" = "Nettuts",
"Action" = "Submit"
);
$ch = Curl_init ();
curl_setopt ($ch, Curlopt_url, $url);
curl_setopt ($ch, Curlopt_returntransfer, 1);
We're in post data Oh!
curl_setopt ($ch, Curlopt_post, 1);
Add the post variable to the
curl_setopt ($ch, Curlopt_postfields, $post _data);
$output = curl_exec ($ch);
Curl_close ($ch);
Echo $output;

After executing the code, you should get the following results:

This script sends a POST request to post_output.php, this page $_post the variable and returns, and we use curl to capture the output.

File Upload

The upload file is very similar to the previous post. Because all file upload forms are submitted via the Post method.

First create a new page to receive the file, named upload_output.php:

Print_r ($_files);

Here's a script that really performs the file upload task:

The following is the referenced content:

$url = "http://localhost/upload_output.php";
$post _data = Array (
"foo" = "Bar",
Local file address to upload
"Upload" = "@c:/wamp/www/test.zip"
);
$ch = Curl_init ();
curl_setopt ($ch, Curlopt_url, $url);
curl_setopt ($ch, Curlopt_returntransfer, 1);
curl_setopt ($ch, Curlopt_post, 1);
curl_setopt ($ch, Curlopt_postfields, $post _data);
$output = curl_exec ($ch);
Curl_close ($ch);
Echo $output;

If you need to upload a file, just pass the file path like a post variable, but remember to precede it with the @ sign. Executing this script should give you the following output:

Curl Batch processing (Multi Curl)

Curl also has an advanced feature?? The batch handle (handle). This feature allows you to open multiple URL connections simultaneously or asynchronously.

The following is a sample code from Php.net:

The following are the contents of the reference:

//Create two Curl resources
$ch 1 = curl_init ();
$ch 2 = Curl_init ();
//Specify URL and appropriate parameters
Curl_setopt ($ch 1, Curlopt_url, "http://lxr.php.net/");
curl_setopt ($ch 1, curlopt_header, 0);
Curl_setopt ($ch 2, Curlopt_url, "http://www.php.net/");
Curl_setopt ($ch 2, Curlopt_header, 0);
//Create Curl Batch handle
$MH = Curl_multi_init ();
Add the preceding two resource handles
Curl_multi_add_handle ($MH, $ch 1);
Curl_multi_add_handle ($MH, $ch 2);
//Predefined one state variable
$active = null;
Execute batch
Do {
$MRC = curl_multi_exec ($MH, $active);
} while ($MRC = = curlm_call_multi_perform);
while ($active && $MRC = = CURLM_OK) {
if (Curl_multi_select ($MH)! =-1) {
Do {
$MRC = curl_multi_exec ( $MH, $active);
} while ($MRC = = Curlm_call_multi_perform);
}
}
//Close each handle
Curl_multi_remove_handle ($MH, $ch 1);
Curl_multi_remove_handle ($MH, $ch 2);
Curl_multi_close ($MH);

All you have to do is open multiple curl handles and assign them to a batch handle. Then you just wait for it to execute in a while loop.

There are two main loops in this example. The first do-while loop repeats the invocation of Curl_multi_exec (). This function is no partition (non-blocking), but will be executed as little as possible. It returns a status value, as long as the value equals the constant Curlm_call_multi_perform, there is still some urgent work to do (for example, sending the HTTP header information for the corresponding URL). That is, we need to constantly call this function until the return value changes.

The next while loop continues only if the $active variable is true. This variable is passed as the second argument to Curl_multi_exec (), which represents whether there is an active connection in the batch handle as well. Next, we call Curl_multi_select (), which is "masked" until the active connection (for example, the Accept server response) appears. Once this function executes successfully, we go to another do-while loop and continue to the next URL.

Let's take a look at how to implement this function:

WordPress Connection Checker

Imagine that you have a blog with a large number of articles that contain a lot of links to external websites. After a while, the amount of these links is invalidated for such reasons. Either the harmony, or the entire site has been the Kung Fu network ...

We build a script below, analyze all these links, find out which sites/pages are not open or 404, and generate a report.

Please note that the following is not a really usable WordPress plugin, just a script for a standalone function, just for demonstration, thank you.

OK, let's get started. First, read all these links from the database:

The following is the referenced content:

CONFIG
$db _host = ' localhost ';
$db _user = ' root ';
$db _pass = ";
$db _name = ' WordPress ';
$excluded _domains = Array (
' localhost ', ' www.mydomain.com ');
$max _connections = 10;
Initialize a number of variables
$url _list = Array ();
$working _urls = Array ();
$dead _urls = Array ();
$not _found_urls = Array ();
$active = null;
Connecting to MySQL
if (!mysql_connect ($db _host, $db _user, $db _pass)) {
Die (' Could not connect: '. Mysql_error ());
}
if (!mysql_select_db ($db _name)) {
Die (' Could not select DB: '. mysql_error ());
}
Find all articles that contain links
$q = "Select Post_content from Wp_posts
WHERE post_content like '%href=% '
and post_status = ' publish '
and Post_type = ' post ';
$r = mysql_query ($q) or Die (Mysql_error ());
while ($d = Mysql_fetch_assoc ($r)) {
Match links with regular matches
if (Preg_match_all ("!href=\" (. *?) \ "!", $d [' post_content '], $matches)) {
foreach ($matches [1] as $url) {
Exclude some domains
$tmp = Parse_url ($url);
if (In_array ($tmp [' Host '], $excluded _domains)) {
Continue
}
Store the URL
$url _list []= $url;
}
}
}
Remove duplicate links
$url _list = array_values (Array_unique ($url _list));
if (! $url _list) {
Die (' No URL to check ');
}

We first configure the database, a series of domain names to exclude ($excluded _domains), and the maximum number of concurrent connections ($max _connections). Then, connect to the database, get the articles and the included links, and collect them into an array ($url _list).

The following code is a bit complicated, so I'll explain in detail in small steps:

The following is the referenced content:

1. Batch Processor
$MH = Curl_multi_init ();
2. Add URLs that need to be processed in batches
for ($i = 0; $i < $max _connections; $i + +) {
Add_url_to_multi_handle ($MH, $url _list);
}
3. Initial processing
do {
$MRC = Curl_multi_exec ($MH, $active);
} while ($MRC = = Curlm_call_multi_perform);
4. Main loop
while ($active && $MRC = = CURLM_OK) {
5. There are active connections
if (Curl_multi_select ($MH)! =-1) {
6. Work
do {
$MRC = Curl_multi_exec ($MH, $active);
} while ($MRC = = Curlm_call_multi_perform);
7. Any information?
if ($mhinfo = Curl_multi_info_read ($MH)) {
means that the connection ends normally.
8. Get information from curl handle
$chinfo = Curl_getinfo ($mhinfo [' handle ']);
9. A dead chain?
if (! $chinfo [' Http_code ']) {
$dead _urls []= $chinfo [' url '];
10. 404, huh?
} else if ($chinfo [' http_code '] = = 404) {
$not _found_urls []= $chinfo [' url '];
11. can also use
} else {
$working _urls []= $chinfo [' url '];
}
12. Remove handle
Curl_multi_remove_handle ($MH, $mhinfo [' handle ']);
Curl_close ($mhinfo [' handle ']);
13. Add a new URL, work
if (Add_url_to_multi_handle ($MH, $url _list)) {
do {
$MRC = Curl_multi_exec ($MH, $active);
} while ($MRC = = Curlm_call_multi_perform);
}
}
}
}
14. Finished
Curl_multi_close ($MH);
echo "==dead urls==\n";
echo implode ("\ n", $dead _urls). "\ n";
echo "==404 urls==\n";
echo implode ("\ n", $not _found_urls). "\ n";
echo "==working urls==\n";
echo implode ("\ n", $working _urls);
15. Add a URL to a batch processor
function Add_url_to_multi_handle ($MH, $url _list) {
static $index = 0;
If the URL is left unused
if ($url _list[$index]) {
New Curl Handle
$ch = Curl_init ();
Configure URLs
curl_setopt ($ch, Curlopt_url, $url _list[$index]);
Do not want to output the returned content
curl_setopt ($ch, Curlopt_returntransfer, 1);
Redirect to where we go
curl_setopt ($ch, curlopt_followlocation, 1);
No content body required, can save bandwidth and time
curl_setopt ($ch, curlopt_nobody, 1);
Adding to the batch processor
Curl_multi_add_handle ($MH, $ch);
Dial the counter, and the next time you call the function, you can add the next URL.
$index + +;
return true;
} else {
No new URLs to deal with
return false;
}
}

The above code is explained below. The ordinal number of the list corresponds to the sequential numbers in the code comment.

Create a new batch processor. Created a multi handle. Later we will create a function add_url_to_multi_handle () that adds the URL to the batch processor. Whenever this function is called, a new URL is added to the batch processor. At first, we added 10 URLs to the batch processor (this figure was determined by $max _connections). It is necessary to run curl_multi_exec () for initialization, as long as it returns curlm_call_multi_perform there is still work to be done. This is done primarily to create a connection, which does not wait for the full URL response. As long as there are active connections in the batch, the main loop will persist. Curl_multi_select () waits until a URL query generates an active connection. Curl's work came again, mostly to get the response data. Check various information. When a URL request is complete, an array is returned. There is a CURL handle in the returned array. We use it to get the appropriate information for a single curl request. If this is a dead chain or the request times out, the HTTP status code is not returned. If this page is not found, a 404 status code will be returned. In other cases we think this link is available (of course, you can also check the 500 error and so on ...) ）。 Remove this curl handle from the batch because it's no longer using value, turn it off! Good, now you can add another URL to come in. Once again, the initialization work begins ... Well, it's all dry. Close the batch processor and generate a report. Look back at the function that adds a new URL to the batch processor. Each time this function is called, the static variable $index incremented once, so that we know how many URLs are left in the process.

I ran this script over my blog (test needs, some error links were intentionally added), the results are as follows:

The following is the referenced content:

A total of about 40 URLs are inspected, which takes less than two seconds. When you need to check a larger number of URLs, its worry-saving effect can be imagined! If you open 10 connections at the same time, you can get up to 10 times times faster! In addition, you can use the no-partition feature of the Curl batch to handle a large number of URL requests without blocking your web script.

A few other useful curl options

HTTP Authentication

If a URL request requires HTTP-based authentication, you can use the following code:
Copy content to Clipboard code:

The following is the referenced content:

$url = "http://www.somesite.com/members/";
$ch = Curl_init ();
curl_setopt ($ch, Curlopt_url, $url);
curl_setopt ($ch, Curlopt_returntransfer, 1);
Send user name and password
curl_setopt ($ch, Curlopt_userpwd, "Myusername:mypassword");
You can allow it to redirect
curl_setopt ($ch, curlopt_followlocation, 1);
The following options let CURL after redirection
User name and password can also be sent
curl_setopt ($ch, Curlopt_unrestricted_auth, 1);
$output = curl_exec ($ch);
Curl_close ($ch);

FTP Upload

PHP comes with an FTP class library, but you can also use CURL:

The following is the referenced content:

Open a file pointer
$file = fopen ("/path/to/file", "R");
The URL contains most of the required information
$url = "Ftp://username:password@mydomain.com:21/path/to/new/file";
$ch = Curl_init ();
curl_setopt ($ch, Curlopt_url, $url);
curl_setopt ($ch, Curlopt_returntransfer, 1);
Upload the relevant options
curl_setopt ($ch, curlopt_upload, 1);
curl_setopt ($ch, Curlopt_infile, $fp);
curl_setopt ($ch, Curlopt_infilesize, FileSize ("/path/to/file"));
Whether to turn on ASCII mode (useful when uploading text files)
curl_setopt ($ch, CURLOPT_FTPASCII, 1);
$output = curl_exec ($ch);
Curl_close ($ch);

Postoperative

You can use a proxy to initiate a curl request:

The following is the referenced content:

$ch = Curl_init ();
curl_setopt ($ch, Curlopt_url, ' http://www.example.com ');
curl_setopt ($ch, Curlopt_returntransfer, 1);
Specify proxy Address
curl_setopt ($ch, Curlopt_proxy, ' 11.11.11.11:8080 ');
Provide a user name and password, if needed
curl_setopt ($ch, curlopt_proxyuserpwd, ' user:pass ');
$output = curl_exec ($ch);
Curl_close ($ch);

callback function

You can have curl invoke a specified callback function during a URL request. For example, start using the data as soon as the content or response is downloaded, instead of waiting for the full download to complete.

The following is the referenced content:

$ch = Curl_init ();
curl_setopt ($ch, Curlopt_url, ' http://net.tutsplus.com ');
curl_setopt ($ch, Curlopt_writefunction, "progress_function");
Curl_exec ($ch);
Curl_close ($ch);
function Progress_function ($ch, $str) {
Echo $str;
return strlen ($STR);
}

This callback function must return the length of the string, otherwise this function will not work properly.

In the process of receiving a URL response, the function is called whenever a packet is received.

Summary

Today we learned about the power of the Curl Library and the flexibility of extensibility. I hope you like it. The next time you want to launch a URL request, consider curl!

Text: PHP-based Curl Quick start



This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More