Using PHP's Curl library can be a simple and effective way to capture web pages. All you have to do is run a script and then analyze the pages you crawl, and then you can get the data you want in a program. Whether you want to take part of the data from a link, or take an XML file and import it into a database, it's a simple way to get the Web content, CURL is a powerful PHP library. This article mainly describes if you use this PHP library.
Here is a simple example of using the Curl Library to crawl a Web page:
<?php
$curl = Curl_init (); Initializes a Curl object
curl_setopt ($curl, Curlopt_url, ' http://cocre.com '); Set the URL you want to crawl
curl_setopt ($curl, Curlopt_header, 1); Set Header
curl_setopt ($curl, Curlopt_returntransfer, 1); Set Curl Parameters
$data = curl_exec ($curl); Run Curl, request Web page
Curl_close ($curl); Close URL Request
Var_dump ($data); Show the data you get
?>
How to post data
This is the code that crawls the page, and the following is the post data to a Web page. Let's say we have a URL http://www.example.com/sendSMS.php that handles the form, it can accept two form fields, one is the phone number, the other is a text message.
<?php
$phoneNumber = ' 13912345678 ';
$message = ' This is generated by curl and PHP ';
$curlPost = ' pnumber= '. UrlEncode ($phoneNumber);
$curlPost. = ' & Message= '. UrlEncode ($message);
$curlPost. = ' & Submit=send ';
$ch = Curl_init ();
curl_setopt ($ch, Curlopt_url, ' http://www.example.com/sendSMS.php ');
curl_setopt ($ch, Curlopt_header, 1);
curl_setopt ($ch, Curlopt_returntransfer, 1);
curl_setopt ($ch, Curlopt_post, 1);
curl_setopt ($ch, Curlopt_postfields, $curlPost);
$data = Curl_exec (); Curl_close ($ch);
?>
From the above program we can see that we set the Post method of the HTTP protocol using Curlopt_post instead of the Get method, and then set the post data in Curlopt_postfields.
About proxy servers
The following is an example of how to use a proxy server. Note that the highlighted code, the code is very simple, I do not have to say more.
<?php
$ch = Curl_init ();
curl_setopt ($ch, Curlopt_url, ' http://www.example.com ');
curl_setopt ($ch, Curlopt_header, 1);
curl_setopt ($ch, Curlopt_returntransfer, 1);
curl_setopt ($ch, Curlopt_httpproxytunnel, 1);
curl_setopt ($ch, Curlopt_proxy, ' fakeproxy.com:1080 ');
curl_setopt ($ch, curlopt_proxyuserpwd, ' User:password ');
$data = Curl_exec (); Curl_close ($ch);
?>
About SSL and cookies
As for SSL, the HTTPS protocol, you just have to turn the http://in the Curlopt_url connection into https://. Of course, there is also a parameter called Curlopt_ssl_verifyhost can be set to verify the site.
For cookies, you need to know the following three parameters:
Curlopt_cookie, set a COOKIE in your face-to-face session
Curlopt_cookiejar, save a cookie when the session ends
Curlopt_cookiefile,cookie's file.
HTTP Server Authentication
Finally, let's look at HTTP server authentication.
<?php
$ch = Curl_init ()
curl_setopt ($ch, Curlopt_url, ' http://www.example.com ');
curl_setopt ($ CH, curlopt_returntransfer, 1);
curl_setopt ($ch, Curlopt_httpauth, Curlauth_basic);
curl_setopt (Curlopt_userpwd, ' [Username]:[password] ');
$data = curl_exec ();
Curl_close ($ch)
For more information, see the relevant Curl manual.