The Linux Shell provides two very useful commands to crawl Web pages, which are curl and wget, respectively.
As the basic service of Big data analysis and research, M-flapping agent has done in-depth research and summary.
Curl and wget using proxies
Curl supports HTTP, HTTPS, SOCKS4, SOCKS5
Wget supports HTTP, https
Shell Curl Wget Example
#!/bin/bash## Curl supports HTTP, HTTPS, SOCKS4, socks5# wget support HTTP, https## meter topology Proxy example: # https://proxy.mimvp.com/demo2.php## M flapping agent buy: # https://proxy.mimvp.com## mimvp.com# 2015-11-09# "meter flapping agent": this example, on CentOS, Ubuntu, MacOS and other servers, are tested via # # HTTP proxy format http _proxy=http://ip:port# HTTPS proxy format https_proxy=http://ip:port## proxy no auth# curl and wget, crawling HTTP Web page {' http ': '// 120.77.176.179:8888 '}curl-m--retry 3-x http://120.77.176.179:8888 http://proxy.mimvp.com/test_proxy2.php # HT Tp_proxywget-t--tries 3-e "http_proxy=http://120.77.176.179:8888" http://proxy.mimvp.com/test_proxy2.php # HTTP_ proxy# Curl and wget, crawling HTTPS Web pages (note: Add parameters without SSL security authentication) {' https ': ' http://46.105.214.133:3128 '}curl-m--retry 3-x/http 46.105.214.133:3128-k https://proxy.mimvp.com/test_proxy2.php # https_proxywget-t--tries 3-e "Https_proxy=ht tp://46.105.214.133:3128 "--no-check-certificate https://proxy.mimvp.com/test_proxy2.php# https_proxy # Curl Support Socks # among them, SOCKS4 and SOCKS5 two kinds of protocol agent, can crawl both HTTP and HTTPS Web page {' socks4 ': ' 101.255.17.145:1080 '}curl-m--retry 3--socks4 101.255.17.145:1080 http://proxy.mimvp.com/test_proxy2.phpcurl-m 30-- Retry 3--socks4 101.255.17.145:1080 https://proxy.mimvp.com/test_proxy2.php {' socks5 ': ' 82.164.233.227:45454 '}curl- M--retry 3--socks5 82.164.233.227:45454 http://proxy.mimvp.com/test_proxy2.phpcurl-m--retry 3--socks5 82.164.23 3.227:45454 https://proxy.mimvp.com/test_proxy2.php# wget does not support socks## proxy auth (proxy requires user name and password Authentication) # Curl and wget, Crawling HTTP pages curl-m--retry 3-x http://username:[email protected]:5718 http://proxy.mimvp.com/test_proxy2.php# Httpcurl-m--retry 3-x http://username:[email protected]:5718 https://proxy.mimvp.com/test_proxy2.php# Httpscurl-m--retry 3-u username:password-x http://210.159.166.225:5718 http://proxy.mimvp.com/test_proxy2.php# Httpcurl-m--retry 3-u username:password-x http://210.159.166.225:5718 https://proxy.mimvp.com/test_proxy2.php# Httpscurl-m--retry 3--proxy-user username:password-x http://210.159.166.225:5718 http://proxy.mimvp.com/test_proxy2.php# httpcurl-m--retry 3--proxy-user username:password-x http://210.159.166.225:5718 https://proxy.mimvp.com/test_proxy2.php# httpswget-t--tries 3-e "http_proxy=http:// username:[email protected]:5718 "Http://proxy.mimvp.com/test_proxy2.phpwget-T--tries 3-e" https_proxy=http ://username:[email protected]:5718 "Https://proxy.mimvp.com/test_proxy2.phpwget-T--tries 3--proxy-user= Username--proxy-password=password-e "http_proxy=http://2.19.16.5:5718" Http://proxy.mimvp.com/test_ Proxy2.phpwget-t--tries 3--proxy-user=username--proxy-password=password-e "https_proxy=http://2.19.16.5:5718" https://proxy.mimvp.com/test_proxy2.php# Curl Support sockscurl-m--retry 3-u Username:password--socks5 21.59.126.22:57216 http://proxy.mimvp.com/test_proxy2.php# httpcurl-m--retry 3-u Username:password--socks5 21.59.126.22:57216 https://proxy.mimvp.com/test_proxy2.php# httpscurl-m--retry 3--proxy-user Username:password--socks5 21.59.126.22:57216 http://proxy.mimvp.com/test_proxy2.php# httpcurl-m 3--retry-- Proxy-user username:password--socks5 21.59.126.22:57216 https://proxy.mimvp.com/test_proxy2.php# https# wget Socks not supported
wget configuration File Settings Agent
Vim ~/.wgetrchttp_proxy=http://120.77.176.179:8888:8080https_proxy=http://12.7.17.17:8888:8080use_proxy = Onwait = 30# configuration file settings, immediately take effect, directly execute the wget crawl command can be wget-t 3--tries Http://proxy.mimvp.com/test_proxy2.phpwget-T--tries 3 HTTPS://PR oxy.mimvp.com/test_proxy2.php
Shell Set temporary local agent
# Proxy No authexport http_proxy=http://120.77.176.179:8888:8080export https_proxy=http://12.7.17.17:8888:8080# Proxy auth (agent requires username and password Authentication) export Http_proxy=http://username:[email Protected]:8888:8080export https_proxy=http:// Username:[email protected]:8888:8080# Direct Crawl page curl-m 3--retry http://proxy.mimvp.com/test_proxy2.php# http_ Proxycurl-m--retry 3 https://proxy.mimvp.com/test_proxy2.php# https_proxywget-t--tries 3 http://proxy.mimvp.com /test_proxy2.php# http_proxywget-t--tries 3 https://proxy.mimvp.com/test_proxy2.php# https_proxy# Cancel settings unset http_ Proxyunset Https_proxy
Shell Set System global Agent
# Modify/etc/profile, save and restart the server sudo vim/etc/profile# all valid or sudo vim ~/.bashrc# all valid or Vim ~/.bash_profile# personal effective # # At the end of the file, add the following: # Proxy No authexport http_proxy=http://120.77.176.179:8888:8080export https_proxy=http://12.7.17.17:8888:8080# Proxy auth (agent requires username and password Authentication) export Http_proxy=http://username:[email Protected]:8888:8080export https_proxy=http:// Username:[email protected]:8888:8080## executes the source command so that the configuration file takes effect (temporarily) Source/etc/profile or source ~/.BASHRC or source ~/.bash_ profile## If a machine is required to be permanently active, you will need to restart the server sudo reboot
Example of a rice flapping agent
Rice flapping agent, focus on providing enterprises with domestic big data research services, technical team from Baidu, Millet, Ali, innovation Workshop, for domestic enterprises to provide large data collection, data modeling analysis, results export display and other services.
An example of the rice flapping agent, including Python, Java, PHP, C #, Go, Perl, Ruby, Shell, NodeJS, Phantomjs, Groovy, Delphi, easy language , and more than 10 programming languages or scripts, Through a large number of operational examples, detailed explanation of the use of proxy IP right way, easy to crawl Web pages, data collection, automated testing and other fields.
example of an agent for rice flapping :
https://proxy.mimvp.com/demo2.php
Shell command Curl and wget use proxy to collect Web pages summary Daquan