PHP-based cURL quick start tutorial (thief collection program)

Source: Internet
Author: User
Tags curl options http authentication
CURL is a tool that uses URL syntax to transmit files and data. It supports many protocols, such as HTTP, FTP, and TELNET. Many thieves use this function.

CURL is a tool that uses URL syntax to transmit files and data. It supports many protocols, such as HTTP, FTP, and TELNET. Many thieves use this function.

The best thing is that PHP also supports the cURL library. This article describes some advanced features of cURL and how to use it in PHP.

Why use cURL?

Yes. We can use other methods to obtain the webpage content. Most of the time, I want to be lazy and use simple PHP functions directly:

However, this approach lacks flexibility and effective error handling. Moreover, you cannot use it to complete difficult tasks, such as coockies processing, verification, form submission, and file upload.

Reference:

Basic Structure

Before learning more complex functions, let's take a look at the basic steps for creating cURL requests in PHP:

  1. Initialization
  2. Set Variables
  3. Execute and obtain results
  4. Release cURL handle

The second step (that is, curl_setopt () is the most important, and all the mysteries are here. There is a long string of cURL parameters that can be set. They can specify the details of URL requests. It may be difficult to read and understand all at once, so today we will only try out the more common and useful options.

Check Error

You can add a statement to check for errors (although this is not required ):

Note that we use "= FALSE" instead of "= FALSE" for comparison ". Because we need to distinguish between Null Output and Boolean value FALSE, the latter is a real error.

Obtain information

This is another optional setting item, which can obtain the relevant information of this request after the cURL is executed:

The returned array contains the following information:

  • "Url" // Resource Network Address
  • "Content_type" // Content Encoding
  • "Http_code" // HTTP status code
  • "Header_size" // header size
  • "Request_size" // request size
  • "Filetime" // File Creation Time
  • "Ssl_verify_result" // SSL Verification Result
  • "Redirect_count" // jump technology
  • "Total_time" // total time consumed
  • "Namelookup_time" // DNS query time
  • "Connect_time" // waiting for connection time
  • "Pretransfer_time" // pre-transmission preparation time
  • "Size_upload" // size of the uploaded data
  • "Size_download" // size of the downloaded data
  • "Speed_download" // download speed
  • "Speed_upload" // upload speed
  • "Download_content_length" // The length of the downloaded content
  • "Upload_content_length" // length of the uploaded content
  • "Starttransfer_time" // start time of transmission
  • "Redirect_time" // time consumed by redirection

Browser-based redirection

In the first example, we will provide a piece of code to detect whether the server has browser-based redirection. For example, some websites redirect webpages based on whether the browser is a mobile phone or even the country from which the user is from.

We use the CURLOPT_HTTPHEADER option to set the HTTP Request Header (http headers) that we send, including the user agent information and the default language. Then let's see if these websites will redirect us to different URLs.

First, we create a set of URLs to be tested, and then specify a set of browser information to be tested. Finally, the possible situation of URL matching and browser matching is tested cyclically.

Because the cURL option is specified, only the HTTP header information (stored in $ output) is returned ). Using a simple regular expression, we can check whether the header information contains the "Location.

The following results should be returned when you run this Code:

Use the POST method to send data

When a GET request is initiated, data can be transmitted to a URL through the query string. For example, when searching in google, the search key is part of the query string of the URL:

In this case, you may not need cURL to simulate it. You can get the same result by throwing this URL to "file_get_contents.

However, some HTML forms are submitted using the POST method. When such a form is submitted, the data is sent through the HTTP request body instead of the query string. For example, when using the CodeIgniter Forum form, no matter what keyword you enter, it is always POST to the following page:

You can use PHP scripts to simulate such URL requests. First, create a new file that accepts and displays POST data. we name it post_output.php:

Next, write a PHP script to execute the cURL request:

After the code is executed, the following results are returned:

This script sends a POST request to post_output.php. The $ _ POST variable on this page is returned. We captured this output using cURL.

File Upload

The uploaded file is very similar to the previous POST. Because all file upload forms are submitted through the POST method.

First, create a page for receiving files, named upload_output.php:

The following is a script for executing a file upload task:

If you need to upload a file, you only need to pass the file path like a post variable, but remember to add the @ symbol before it. Execute this script and you will get the following output:

CURL (multi cURL)

CURL also has an advanced feature, handle ). This feature allows you to open multiple URL connections simultaneously or asynchronously.

The following is the sample code from php.net:

Here, you need to open multiple cURL handles and assign them to one handle. Then you just need to wait for it to be executed in a while loop.

There are two main loops in this example. The first do-while loop repeatedly calls curl_multi_exec (). This function is non-blocking, but will be executed as little as possible. It returns a status value. As long as the value is equal to the constant CURLM_CALL_MULTI_PERFORM, it indicates that there is still some urgent work to be done (for example, sending the http header information of the corresponding URL ). That is to say, we need to call this function continuously until the return value changes.

The next while LOOP continues only when the $ active Variable is true. This variable was previously passed to curl_multi_exec () as the second parameter, representing whether there is any active connection in the handle. Next, we call curl_multi_select (), which is "blocked" before an active connection (for example, Server Response reception) occurs. After this function is successfully executed, we will go to another do-while loop and continue the next URL.

Let's take a look at how to use the feature:

WordPress connection checker

Imagine that you have a blog with a large number of articles that contain a large number of external website links. After a period of time, due to such reasons, a considerable number of these links are invalid. Either it is being harmonious, or the whole site is being attacked by Kung Fu...

Next we will create a script to analyze all these links, find websites/webpages that cannot be opened or 404, and generate a report.

Please note that the following is not a truly available WordPress plug-in. It is just a script with independent functions. It is only for demonstration. Thank you.

Okay. Let's get started. First, read all these links from the database:

We first configure the database, a series of domain names to be excluded ($ excluded_domains), and the maximum number of concurrent connections ($ max_connections ). Then, connect to the database, obtain the articles and links, and collect them into an array ($ url_list ).

The following code is a bit complicated, so I will explain it in a small step:

The above code is explained below. The serial number of the list corresponds to the sequential number in the code comment.

  1. Create a new device. Created a multi handle.
  2. Later, we will create a function add_url_to_multi_handle () for adding URLs to the machine (). Every time this function is called, a new url is added. At the beginning, we added 10 URLs to the machine (This number is determined by $ max_connections ).
  3. It is required to run curl_multi_exec () for initialization, as long as it returns CURLM_CALL_MULTI_PERFORM, there is still something to do. To create a connection, it does not wait for the complete URL response.
  4. As long as there is still active connection to the main loop, it will continue.
  5. Curl_multi_select () will wait until a URL query generates an active connection.
  6. CURL is another task, mainly to obtain response data.
  7. Check various information. When a URL request is complete, an array is returned.
  8. The returned array contains a cURL handle. We use it to obtain the corresponding information of a single cURL request.
  9. If this is a dead link or the request times out, no http status code is returned.
  10. If this page cannot be found, Status Code 404 is returned.
  11. In other cases, we all think that this link is available (of course, you can also check the 500 error and so on ...).
  12. Remove the cURL handle from this batch, because it has no utilization value, and it is disabled!
  13. Well, now you can add another URL. Again, the initialization starts again...
  14. Well, it's all done. Close the tool to generate a report.
  15. Let's look at the function for adding a new URL to the machine. Every time this function is called, the static variable $ index increments once so that we can know how many URLs are not processed.

I ran this script on my blog (test required, some error links were intentionally added). The result is as follows:

A total of about 40 URLs are checked, which takes less than two seconds. When you need to check a large number of URLs, the effect of worry-free and effort-saving can be imagined! If you open 10 connections at the same time, it will be 10 times faster! In addition, you can use the no-cut feature of cURL to process a large number of URL requests without blocking your Web scripts.

Other useful cURL options

HTTP Authentication

If a URL request requires HTTP-based authentication, you can use the following code:
Copy the content to the clipboard code:

FTP upload

PHP comes with an FTP class library, but you can also use cURL:

Wall turning

You can use a proxy to initiate a cURL request:

Callback Function

CURL can call a specified callback function during a URL request. For example, you can use data immediately during content or response download, instead of waiting until the download is complete.

This callback function must return the length of the string, otherwise this function will not work properly.

In the process of receiving a URL response, this function will be called as long as a packet is received.

Summary

Today we have learned togetherCURLPowerful functions and flexible scalability of the database. Hope you like it. Consider cURL for the next url request!

Original article:

Original English:

Author:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.