Shell command Curl and wget use proxy to collect Web pages summary Daquan

Source: Internet
Author: User

The Linux Shell provides two very useful commands to crawl Web pages, which are curl and wget, respectively.

As the basic service of Big data analysis and research, M-flapping agent has done in-depth research and summary.

Curl and wget using proxies

Curl supports HTTP, HTTPS, SOCKS4, SOCKS5

Wget supports HTTP, https

Shell Curl Wget Example

#!/bin/bash## Curl supports HTTP, HTTPS, SOCKS4, socks5# wget support HTTP, https## meter topology Proxy example: # https://proxy.mimvp.com/demo2.php## M flapping agent buy: # https://proxy.mimvp.com## mimvp.com# 2015-11-09# "meter flapping agent": this example, on CentOS, Ubuntu, MacOS and other servers, are tested via # # HTTP proxy format http _proxy=http://ip:port# HTTPS proxy format https_proxy=http://ip:port## proxy no auth# curl and wget, crawling HTTP Web page {' http ': '// 120.77.176.179:8888 '}curl-m--retry 3-x http://120.77.176.179:8888 http://proxy.mimvp.com/test_proxy2.php # HT Tp_proxywget-t--tries 3-e "http_proxy=http://120.77.176.179:8888" http://proxy.mimvp.com/test_proxy2.php # HTTP_ proxy# Curl and wget, crawling HTTPS Web pages (note: Add parameters without SSL security authentication) {' https ': ' http://46.105.214.133:3128 '}curl-m--retry 3-x/http 46.105.214.133:3128-k https://proxy.mimvp.com/test_proxy2.php # https_proxywget-t--tries 3-e "Https_proxy=ht tp://46.105.214.133:3128 "--no-check-certificate https://proxy.mimvp.com/test_proxy2.php# https_proxy # Curl Support Socks # among them, SOCKS4 and SOCKS5 two kinds of protocol agent, can crawl both HTTP and HTTPS Web page {' socks4 ': ' 101.255.17.145:1080 '}curl-m--retry 3--socks4 101.255.17.145:1080 http://proxy.mimvp.com/test_proxy2.phpcurl-m 30-- Retry 3--socks4 101.255.17.145:1080 https://proxy.mimvp.com/test_proxy2.php {' socks5 ': ' 82.164.233.227:45454 '}curl- M--retry 3--socks5 82.164.233.227:45454 http://proxy.mimvp.com/test_proxy2.phpcurl-m--retry 3--socks5 82.164.23 3.227:45454 https://proxy.mimvp.com/test_proxy2.php# wget does not support socks## proxy auth (proxy requires user name and password Authentication) # Curl and wget, Crawling HTTP pages curl-m--retry 3-x http://username:[email protected]:5718 http://proxy.mimvp.com/test_proxy2.php# Httpcurl-m--retry 3-x http://username:[email protected]:5718 https://proxy.mimvp.com/test_proxy2.php# Httpscurl-m--retry 3-u username:password-x http://210.159.166.225:5718 http://proxy.mimvp.com/test_proxy2.php# Httpcurl-m--retry 3-u username:password-x http://210.159.166.225:5718 https://proxy.mimvp.com/test_proxy2.php# Httpscurl-m--retry 3--proxy-user username:password-x http://210.159.166.225:5718 http://proxy.mimvp.com/test_proxy2.php# httpcurl-m--retry 3--proxy-user username:password-x http://210.159.166.225:5718 https://proxy.mimvp.com/test_proxy2.php# httpswget-t--tries 3-e "http_proxy=http:// username:[email protected]:5718 "Http://proxy.mimvp.com/test_proxy2.phpwget-T--tries 3-e" https_proxy=http ://username:[email protected]:5718 "Https://proxy.mimvp.com/test_proxy2.phpwget-T--tries 3--proxy-user= Username--proxy-password=password-e "http_proxy=http://2.19.16.5:5718" Http://proxy.mimvp.com/test_ Proxy2.phpwget-t--tries 3--proxy-user=username--proxy-password=password-e "https_proxy=http://2.19.16.5:5718" https://proxy.mimvp.com/test_proxy2.php# Curl Support sockscurl-m--retry 3-u Username:password--socks5 21.59.126.22:57216 http://proxy.mimvp.com/test_proxy2.php# httpcurl-m--retry 3-u Username:password--socks5 21.59.126.22:57216 https://proxy.mimvp.com/test_proxy2.php# httpscurl-m--retry 3--proxy-user Username:password--socks5 21.59.126.22:57216 http://proxy.mimvp.com/test_proxy2.php# httpcurl-m 3--retry-- Proxy-user username:password--socks5 21.59.126.22:57216 https://proxy.mimvp.com/test_proxy2.php# https# wget Socks not supported

  

wget configuration File Settings Agent

Vim ~/.wgetrchttp_proxy=http://120.77.176.179:8888:8080https_proxy=http://12.7.17.17:8888:8080use_proxy = Onwait = 30# configuration file settings, immediately take effect, directly execute the wget crawl command can be wget-t 3--tries Http://proxy.mimvp.com/test_proxy2.phpwget-T--tries 3 HTTPS://PR oxy.mimvp.com/test_proxy2.php

  

Shell Set temporary local agent

# Proxy No authexport http_proxy=http://120.77.176.179:8888:8080export https_proxy=http://12.7.17.17:8888:8080# Proxy auth (agent requires username and password Authentication) export Http_proxy=http://username:[email Protected]:8888:8080export https_proxy=http:// Username:[email protected]:8888:8080# Direct Crawl page curl-m 3--retry http://proxy.mimvp.com/test_proxy2.php# http_ Proxycurl-m--retry 3 https://proxy.mimvp.com/test_proxy2.php# https_proxywget-t--tries 3 http://proxy.mimvp.com /test_proxy2.php# http_proxywget-t--tries 3 https://proxy.mimvp.com/test_proxy2.php# https_proxy# Cancel settings unset http_ Proxyunset Https_proxy

  

Shell Set System global Agent

# Modify/etc/profile, save and restart the server sudo vim/etc/profile# all valid or sudo vim ~/.bashrc# all valid or Vim ~/.bash_profile# personal effective # # At the end of the file, add the following: # Proxy No authexport http_proxy=http://120.77.176.179:8888:8080export https_proxy=http://12.7.17.17:8888:8080# Proxy auth (agent requires username and password Authentication) export Http_proxy=http://username:[email Protected]:8888:8080export https_proxy=http:// Username:[email protected]:8888:8080## executes the source command so that the configuration file takes effect (temporarily) Source/etc/profile or source ~/.BASHRC or source ~/.bash_ profile## If a machine is required to be permanently active, you will need to restart the server sudo reboot

  

Example of a rice flapping agent

Rice flapping agent, focus on providing enterprises with domestic big data research services, technical team from Baidu, Millet, Ali, innovation Workshop, for domestic enterprises to provide large data collection, data modeling analysis, results export display and other services.

An example of the rice flapping agent, including Python, Java, PHP, C #, Go, Perl, Ruby, Shell, NodeJS, Phantomjs, Groovy, Delphi, easy language , and more than 10 programming languages or scripts, Through a large number of operational examples, detailed explanation of the use of proxy IP right way, easy to crawl Web pages, data collection, automated testing and other fields.

example of an agent for rice flapping :

https://proxy.mimvp.com/demo2.php

Shell command Curl and wget use proxy to collect Web pages summary Daquan

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.