Curl mainly captures data. Of course, we can use other methods to capture data, such as fsockopen and file_get_contents. However, you can only capture those pages that can be accessed directly. It is more difficult to capture pages with page access control, or to log on to pages after logon.
Is to put the PHP home page back into a file.
Example 1. Use the php curl module to retrieve the PHP Homepage
The Code is as follows: |
Copy code |
<? Php $ Ch = curl_init (); Curl_setopt ($ ch, CURLOPT_URL, "http: // localhost/mytest/phpinfo. php "); Curl_setopt ($ ch, CURLOPT_HEADER, false ); Curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1); // if this line is commented out, it will be output directly. $ Result = curl_exec ($ ch ); Curl_close ($ ch ); |
2. Use a proxy to capture
Why is it necessary to use a proxy for crawling? Take google for example. If google's data is captured frequently in a short period of time, you will not be able to capture it. When google restricts your IP address, you can use another proxy to re-capture it.
The Code is as follows: |
Copy code |
<? Php $ Ch = curl_init (); Curl_setopt ($ ch, CURLOPT_URL, "http://www.hzhuti.com "); Curl_setopt ($ ch, CURLOPT_HEADER, false ); Curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1 ); Curl_setopt ($ ch, CURLOPT_HTTPPROXYTUNNEL, TRUE ); Curl_setopt ($ ch, CURLOPT_PROXY, FIG: 8080 ); // Url_setopt ($ ch, CURLOPT_PROXYUSERPWD, 'user: password'); add this $ Result = curl_exec ($ ch ); Curl_close ($ ch ); ?> |
3. capture data after post
Let's talk about data submission separately, because when using curl, there are often data interactions, so it is important.
The Code is as follows: |
Copy code |
<? Php $ Ch = curl_init (); /* Note that the data to be submitted cannot be a two-dimensional array or a higher * For example, array ('name' => serialize (array ('tank', 'zhang'), 'sex' => 1, 'birth' => '123 ') * For example, array ('name' => array ('tank', 'zhang'), 'sex' => 1, 'birth' => '123 ') this will result in an error */ $ Data = array ('name' => 'test', 'sex' => 1, 'birth' => '123 '); Curl_setopt ($ ch, CURLOPT_URL, 'HTTP: // localhost/mytest/curl/upload. php '); Curl_setopt ($ ch, CURLOPT_POST, 1 ); Curl_setopt ($ ch, CURLOPT_POSTFIELDS, $ data ); Curl_exec ($ ch ); ?> In upload. in the PHP file, print_r ($ _ POST); Use curl to capture the upload. php output content Array ([name] => test [sex] => 1 [birth] => 20101010) |
4. Capture pages with page Access Control
Three methods of page Access Control
Three methods of page access control are shown on
Directory: apache/nginx
We often see this phenomenon.
Apache page Access Control
Why do we need such control? We want to show different things to different people and protect information. Although such protection is relatively low, it is more or less useful.
1. Use the htpasswd command to generate a permission Control File
The Code is as follows: |
Copy code |
View copy print? 1. [zhangy @ BlackGhost test] $ htpasswd-c./access tank // generate a password file.-c is to create a new file htpasswd-h for viewing. 2. New password: // The system prompts you to enter the password. 3. Re-type new password: // duplicate password 4. Adding password for user tank 5. [zhangy @ BlackGhost test] $ cat access // view the password file 6. tank: Uj5B3qIF/BNdI // The username is in plaintext and the password is encrypted. [Zhangy @ BlackGhost test] $ htpasswd-c./access tank // generate a password file.-c is to create a new file htpasswd-h for viewing. New password: // enter the password Re-type new password: // duplicate password Adding password for user tank [Zhangy @ BlackGhost test] $ cat access // view the password file Tank: Uj5B3qIF/BNdI // The username is in plaintext and the password is encrypted. The password file is generated here. |
Ii. Page Access Control Method
1, can be modified through httpd. conf or httpd-vhosts.conf to configure
The Code is as follows: |
Copy code |
Listen 10004. Namevirtualhost*: 10004 <VirtualHost *: 10004> DocumentRoot "/home/zhangy/www/test" ServerName *: 10004. BandwidthModule On ForceBandWidthModule On Bandwidth all 1024000 MinBandwidth all 50000 LargeFileLimit * 500 50000 MaxConnection all 2 ErrorLog "/home/zhangy/apache/blog.51yip.com.com-error. log" CustomLog "/home/zhangy/apache/blog.51yip.com-access. log" common // Take a look at the following configuration <Directory/home/zhangy/www/test> AuthType Basic AuthName "access test" AuthUserFile/home/zhangy/www/test/access Require valid-user </Directory> </VirtualHost> |
2. We can use the. htaccess file for control.
Create a. htaccess file under the root directory of test
The Code is as follows: |
Copy code |
[Zhangy @ BlackGhost test] $ vi. htaccess & nbsp; // open a file and add the permission content [Zhangy @ BlackGhost test] $ cat. htaccess & nbsp; // The content of. htaccess is as follows: AuthType Basic AuthName "access test" AuthUserFile/home/zhangy/www/test/access Require valid-user |
3. You can also perform access control without using password files.
The Code is as follows: |
Copy code |
Define ('admin _ username', 'tank'); & nbsp; // ADMIN USERNAME Define ('admin _ password', 'tank'); & nbsp; // ADMIN PASSWORD // Log check If (! Isset ($ _ SERVER ['php _ AUTH_USER ']) |! Isset ($ _ SERVER ['php _ AUTH_PW ']) | $ _ SERVER ['php _ AUTH_USER ']! = ADMIN_USERNAME | $ _ SERVER ['php _ AUTH_PW ']! = ADMIN_PASSWORD ){ Header ("WWW-Authenticate: Basic realm =" access test ""); Header ("HTTP/1.0 401 Unauthorized "); Echo & lt; EOB & Lt; html & gt; & lt; body & gt; & Lt; h1 & gt; Rejected! & Lt;/h1 & gt; & Lt; big & gt; Wrong Username or Password! & Lt;/big & gt; & Lt;/body & gt; & lt;/html & gt; EOB; Exit; } |
List of curl-related functions:
Curl_init-initialize a CURL session
Curl_setopt-set an option for CURL calls
Curl_exec-execute a CURL session
Curl_close-close a CURL session
Curl_version-returns the current CURL version
Curl_init-initialize a CURL session
Description
Int curl_init ([string url])
The curl_init () function initializes a new session and returns a CURL handle for use by the curl_setopt (), curl_exec (), and curl_close () functions. If the optional parameter is provided, the CURLOPT_URL option is set to the value of this parameter. You can use the curl_setopt () function for manual settings.
Example 1. initialize a new CURL session and retrieve a webpage
The Code is as follows: |
Copy code |
$ Ch = curl_init (); Curl_setopt ($ ch, CURLOPT_URL, "http://www.zend.com /"); Curl_setopt ($ ch, CURLOPT_HEADER, 0 ); Curl_exec ($ ch ); Curl_close ($ ch ); ?> |