Use of their own small PHP application, using Curl to grasp the page down processing, in order to wear the wall convenient, using Privoxy as a proxy, easy to choose which sites to use proxy, which do not. But today encountered a strange problem, access to Google Baidu These sites have returned 403 errors, and access to other sites are fine, if set to do not use proxy can be normal access.
Google Baidu is not allowed to connect with the proxy? Obviously impossible, so open the Curl information output (curl_setopt ($this->msh, curlopt_verbose, 1);) Look, get the following results:
Copy Code code as follows:
* Trying 127.0.0.1 ... * Connected
* Connected to 127.0.0.1 (127.0.0.1) port 8118 (#0)
* Establish HTTP proxy tunnel to www.baidu.com:80
> CONNECT www.baidu.com:80 http/1.0
Host:www.baidu.com:80
user-agent:mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
Proxy-connection:keep-alive
< http/1.0 403 Connection not allowable
< x-hint:if You read this message interactively, then your know why this happens,-
<
* The requested URL returned error:403
* Received HTTP Code 403 from proxy after CONNECT
* Closing Connection #0
... Failed.
You can see that the proxy server is working properly, and it is true that Baidu returned 403 errors, but the reason must still be on my side. Finally, from the Internet (1OF2, 2OF2) to get a bit of inspiration-I use Proxytunnel rather than proxy.
In the code, there's this sentence:
Copy Code code as follows:
curl_setopt ($this->msh, Curlopt_httpproxytunnel, true);
curl_setopt ($this->msh, Curlopt_proxy, $phost);
There is no detail in the PHP documentation, but there is a detailed explanation in the man curl that both are proxies, and the Proxytunnel (-p parameter) allows other protocols to be transmitted over the HTTP proxy, while the proxy (-x parameter) can only take the HTTP protocol. So I guess, Google Baidu's server and Curl Proxytunnel, so return 403.
After the first sentence of 2 lines of code is disabled, curl Access returns to normal.
Oddly, several operating systems are not the same, a Mac OS X will explicitly disable Proxytunnel, curl version:
Copy Code code as follows:
$ Curl--version
Curl 7.16.3 (powerpc-apple-darwin9.0) libcurl/7.16.3 openssl/0.9.7l zlib/1.2.3
Protocols:tftp ftp telnet dict ldap http file https FTPs
Features:gss-negotiate IPv6 largefile NTLM SSL libz
And another Ubuntu is completely unaffected, how can be used, curl version:
Copy Code code as follows:
$ Curl--version
Curl 7.18.2 (I486-PC-LINUX-GNU) libcurl/7.18.2 openssl/0.9.8g zlib/1.2.3.3 libidn/1.10
Protocols:tftp ftp telnet dict ldap ldaps http file https FTPs
Features:gss-negotiate IDN IPv6 largefile NTLM SSL libz
Mt Host on the CentOS also okay, curl version:
Copy Code code as follows:
$ Curl--version
Curl 7.15.5 (I686-REDHAT-LINUX-GNU) libcurl/7.15.5 openssl/0.9.8b zlib/1.2.3 libidn/0.6.5
Protocols:tftp ftp telnet dict ldap http file https FTPs
Features:gss-negotiate IDN IPv6 largefile NTLM SSL libz
It's not exactly the Curl version, MAC OS X is really different.
There is also a reason for curl to return a 403 error if set:
Copy Code code as follows:
curl_setopt ($ch, Curlopt_nobody, true);
You need to follow the setting:
Copy Code code as follows:
curl_setopt ($ch, curlopt_customrequest, ' get ');
Otherwise, a 403 error will be returned because the HTTP server does not allow the head command. Reference: Trouble with a CURL request in PHP (http://forums.devshed.com/php-development-5/ trouble-with-a-curl-request-in-php-445222.html). The reason why MAC OS X is curl special is not to rule out this.