In the previous article, I wrote how to enable scrapy to Support HTTP proxy.
But scrapy does not support socks proxy by default. Sometimes the pure HTTP proxy is easily intercepted by g f w, and proxy is required to collect websites blocked from abroad. Okay, capability is forced by demand.
The following describes a solution.
Deploy a Linux VPs in the United States or Hong Kong
Debian as an Example
Install necessary components
Apt-Get install build-essential Autoconf libtool libssl-dev gcc-y
Install git
Apt-Get install Git-y
Download and compile the shadowsocks-libev source code package
Git clone https://github.com/madeye/shadowsocks-libev.git
CD shadowsocks-libev
./Configure
Make & make install
Run shadowsocks
/Usr/local/bin/SS-server-s 0.0.0.0-P port-K password-M encryption mode &
Local server (Windows or Linux)
Go http://dl.chenyufei.info/shadowsocks/
Download the shadowsocks client of the corresponding system
In addition, the shadowsocks client can specify multiple servers, which is equivalent to a proxy pool.
The most critical step is to convert the socks proxy into an HTTP proxy.
3 proxy is recommended here, And the homepage is www.3proxy.ru.
3proxy supports windows, Linux, and other platforms. Download and install the SDK directly in windows.
3proxy's typical configuration file
Nscache 65536
Timeouts 1 5 30 60 180 1800 15 60
Daemon
# Service
# External IP
Internal 127.0.0.1
Auth iponly
Allow 127.0.0.1
Parent 1000 SOCKS5 + 127.0.0.1 9999
Proxy-n-a-p1984
127.0.0.1 is the upper-level SOCKS5 proxy server, and 9999 is the proxy server port
1984 indicates the listening port
For more detailed settings, refer to this article.
Http://blog.sunshine-wang.info /? P = 20
Or read the official documentation.
Finally, set the HTTP proxy of scrapy to http: // 127.0.0.1: 1984.
You can.
References
Http://igfw.net/archives/947
Http://www.ouvps.com /? P = 464
Enable scrapy to support socks proxy and proxy pool in disguise