The Linux Shell provides two very useful commands for crawling Web pages, which are curl and wget, respectively.
As the basic service of large data analysis and research, rice flutter agent has done a thorough research and summary.
Curl and wget use proxies
Curl supports HTTP, HTTPS, SOCKS4, SOCKS5
Wget supports HTTP, https
Shell Curl wget Sample
#!/bin/bash # # Curl Support HTTP, HTTPS, SOCKS4, SOCKS5 # wget support HTTP, HTTPS # # M-Flutter Proxy Example: # https://proxy.mimvp.com/demo2.php # M-Flutter Agent purchase: # https://proxy.mimvp.com # mimvp.com # 2015-11-09 # "M-Flutter Agent": this example, on the CentOS, Ubuntu, MacOS and other servers, are tested through the # # HTTP proxy format h Ttp_proxy=http://ip:port # HTTPS proxy format Https_proxy=http://ip:port # # Proxy No auth # curl and wget, crawl http web page {' http ': ' http:/ /120.77.176.179:8888 '} curl-m--retry 3-x http://120.77.176.179:8888 http://proxy.mimvp.com/test_proxy2.php # http_proxy wget-t--tries 3-e "http_proxy=http://120.77.176.179:8888" http://proxy.mimvp.com/test_proxy2.php # H Ttp_proxy # Curl and wget, crawling HTTPS Web pages (note: Add parameters without SSL security authentication) {' https ': ' http://46.105.214.133:3128 '} curl-m--retry 3-x http: 46.105.214.133:3128-k https://proxy.mimvp.com/test_proxy2.php # https_proxy wget-t--tries 3-e "http s_proxy=http://46.105.214.133:3128 "--no-check-certificate https://proxy.mimvp.com/test_proxy2.php # https_proxy # C URL Support Socks # where, SOCKS4 andSOCKS5 two kinds of protocol agents can crawl both HTTP and HTTPS Web pages {' socks4 ': ' 101.255.17.145:1080 '} curl-m--retry 3--socks4 101.255.17.145:1080 http://proxy.mimvp.com/test_proxy2.php curl-m--retry 3--socks4 101.255.17.145:1080 https://proxy.mimvp.com/test_ proxy2.php {' socks5 ': ' 82.164.233.227:45454 '} curl-m--retry 3--socks5 82.164.233.227:45454 http://proxy.mimvp.c om/test_proxy2.php curl-m--retry 3--socks5 82.164.233.227:45454 https://proxy.mimvp.com/test_proxy2.php # wget does not support s Ocks # Proxy auth (agent requires username and password Authentication) # curl and wget, crawling HTTP Web pages curl-m--retry 3-x: 5718 http://proxy.mimvp.com/test_proxy2.php # http curl-m--retry 3-x http://username:password@210.159.166.225:571 8 https://proxy.mimvp.com/test_proxy2.php # HTTPS curl-m--retry 3-u username:password-x http://210.159.166.225:5 718 http://proxy.mimvp.com/test_proxy2.php # http curl-m--retry 3-u username:password-x http://210.159.166.225:5 718 Https://proxy.mimvp.com/test_proxy2.php # HTTPS Curl-m--retry 3--proxy-user username:password-x http://210.159.166.225:5718 HTTP://PROXY.MIMV p.com/test_proxy2.php # http Curl-m--retry 3--proxy-user username:password-x http://210.159.166.225:5718 HTTPS://PR oxy.mimvp.com/test_proxy2.php # HTTPS wget-t--tries 3-e "http_proxy=http://username:password@2.19.16.5:5718" http: proxy.mimvp.com/test_proxy2.php wget-t--tries 3-e "https_proxy=http://username:password@2.19.16.5:5718" https ://proxy.mimvp.com/test_proxy2.php wget-t--tries 3--proxy-user=username--proxy-password=password-e "http_proxy= http://2.19.16.5:5718 "http://proxy.mimvp.com/test_proxy2.php wget-t--tries 3--proxy-user=username-- Proxy-password=password-e "https_proxy=http://2.19.16.5:5718" https://proxy.mimvp.com/test_proxy2.php # Curl Support Socks curl-m--retry 3-u username:password--socks5 21.59.126.22:57216 http://proxy.mimvp.com/test_proxy2.php # HTTP curl-m--retry 3-u username:password--socks5 21.59.126.22:57216 https://proxy.mimvp.com/test_proxy2.php # HTTPS curl-m--retry 3--proxy-user Username:password-- SOCKS5 21.59.126.22:57216 http://proxy.mimvp.com/test_proxy2.php # http curl-m--retry 3--proxy-user username:passwo
Rd--SOCKS5 21.59.126.22:57216 https://proxy.mimvp.com/test_proxy2.php # HTTPS # wget does not support socks
wget configuration File Settings Agent
Vim ~/.wgetrc
http_proxy=http://120.77.176.179:8888:8080
https_proxy=http://12.7.17.17:8888:8080
Use_proxy =
on wait =
# profile settings, immediately take effect, directly execute wget crawl command can
wget-t--tries 3 http://proxy.mimvp.com/test_ proxy2.php
wget-t--tries 3 https://proxy.mimvp.com/test_proxy2.php
Shell Set temporary local agent
# Proxy No auth
export http_proxy=http://120.77.176.179:8888:8080
export https_proxy=http:// 12.7.17.17:8888:8080
# Proxy auth (proxy requires username and password Authentication)
Export http_proxy=http://username:password@120.77.176.179 : 8888:8080
export https_proxy=http://username:password@12.7.17.17:8888:8080
# Direct Crawl page
curl-m--retry 3 http://proxy.mimvp.com/test_proxy2.php # http_proxy
curl-m--retry 3 https://proxy.mimvp.com/test_ proxy2.php # https_proxy
wget-t--tries 3 http://proxy.mimvp.com/test_proxy2.php # http_proxy
Wget-t--tries 3 https://proxy.mimvp.com/test_proxy2.php # https_proxy
# Cancel settings
unset http_proxy
unset Https_proxy
Shell Setup System Global Agent
# Modify/etc/profile, save and restart server
sudo vim/etc/profile # Everyone valid
or
sudo vim ~/.BASHRC # Everyone effective
or
Vim ~/.bash_profile # Personal Effective #
at the end of the file, add the following
# Proxy no auth
export http_proxy=http:// 120.77.176.179:8888:8080
Export https_proxy=http://12.7.17.17:8888:8080
# proxy auth (agent requires user name and password Authentication)
Export http_proxy=http://username:password@120.77.176.179:8888:8080
export https_proxy=http:// username:password@12.7.17.17:8888:8080
# Executes the source command to make the configuration file take effect (temporarily)
source/etc/profile
or
SOURCE ~/.BASHRC
or
source ~/.bash_profile
# # If you need a machine to take effect permanently, you will need to reboot the server
sudo reboot
Meter Flutter Agent Sample
M-flutter agent, focusing on providing enterprises with large domestic data research services, technical team from Baidu, Millet, Ali, innovation workshops, for domestic enterprises to provide large data collection, data modeling analysis, the results of export display services.
The M-Flutter agent sample contains more than 10 programming languages or scripts, including Python, Java, PHP, C #, go, Perl, Ruby, Shell, Nodejs, Phantomjs, Groovy, Delphi, and easy language, through a large number of operational instances, The use of proxy IP is the correct way to facilitate web crawling, data collection, automated testing and other fields.
Meter Flutter Agent Example official website:
https://proxy.mimvp.com/demo2.php