Shell Command Curl and wget use proxy to collect Web page summary Encyclopedia

Source: Internet
Author: User
Tags auth

The Linux Shell provides two very useful commands for crawling Web pages, which are curl and wget, respectively.

As the basic service of large data analysis and research, rice flutter agent has done a thorough research and summary.

Curl and wget use proxies

Curl supports HTTP, HTTPS, SOCKS4, SOCKS5

Wget supports HTTP, https


Shell Curl wget Sample

#!/bin/bash # # Curl Support HTTP, HTTPS, SOCKS4, SOCKS5 # wget support HTTP, HTTPS # # M-Flutter Proxy Example: # https://proxy.mimvp.com/demo2.php # M-Flutter Agent purchase: # https://proxy.mimvp.com # mimvp.com # 2015-11-09 # "M-Flutter Agent": this example, on the CentOS, Ubuntu, MacOS and other servers, are tested through the # # HTTP proxy format h Ttp_proxy=http://ip:port # HTTPS proxy format Https_proxy=http://ip:port # # Proxy No auth # curl and wget, crawl http web page {' http ': ' http:/        			/120.77.176.179:8888 '} curl-m--retry 3-x http://120.77.176.179:8888 http://proxy.mimvp.com/test_proxy2.php # http_proxy wget-t--tries 3-e "http_proxy=http://120.77.176.179:8888" http://proxy.mimvp.com/test_proxy2.php # H Ttp_proxy # Curl and wget, crawling HTTPS Web pages (note: Add parameters without SSL security authentication) {' https ': ' http://46.105.214.133:3128 '} curl-m--retry 3-x http: 46.105.214.133:3128-k https://proxy.mimvp.com/test_proxy2.php # https_proxy wget-t--tries 3-e "http s_proxy=http://46.105.214.133:3128 "--no-check-certificate https://proxy.mimvp.com/test_proxy2.php # https_proxy # C URL Support Socks # where, SOCKS4 andSOCKS5 two kinds of protocol agents can crawl both HTTP and HTTPS Web pages {' socks4 ': ' 101.255.17.145:1080 '} curl-m--retry 3--socks4 101.255.17.145:1080 http://proxy.mimvp.com/test_proxy2.php curl-m--retry 3--socks4 101.255.17.145:1080 https://proxy.mimvp.com/test_ proxy2.php {' socks5 ': ' 82.164.233.227:45454 '} curl-m--retry 3--socks5 82.164.233.227:45454 http://proxy.mimvp.c om/test_proxy2.php curl-m--retry 3--socks5 82.164.233.227:45454 https://proxy.mimvp.com/test_proxy2.php # wget does not support s Ocks # Proxy auth (agent requires username and password Authentication) # curl and wget, crawling HTTP Web pages curl-m--retry 3-x: 5718 http://proxy.mimvp.com/test_proxy2.php # http curl-m--retry 3-x http://username:password@210.159.166.225:571 8 https://proxy.mimvp.com/test_proxy2.php # HTTPS curl-m--retry 3-u username:password-x http://210.159.166.225:5 718 http://proxy.mimvp.com/test_proxy2.php # http curl-m--retry 3-u username:password-x http://210.159.166.225:5 718 Https://proxy.mimvp.com/test_proxy2.php # HTTPS Curl-m--retry 3--proxy-user username:password-x http://210.159.166.225:5718 HTTP://PROXY.MIMV p.com/test_proxy2.php # http Curl-m--retry 3--proxy-user username:password-x http://210.159.166.225:5718 HTTPS://PR oxy.mimvp.com/test_proxy2.php # HTTPS wget-t--tries 3-e "http_proxy=http://username:password@2.19.16.5:5718" http: proxy.mimvp.com/test_proxy2.php wget-t--tries 3-e "https_proxy=http://username:password@2.19.16.5:5718" https ://proxy.mimvp.com/test_proxy2.php wget-t--tries 3--proxy-user=username--proxy-password=password-e "http_proxy= http://2.19.16.5:5718 "http://proxy.mimvp.com/test_proxy2.php wget-t--tries 3--proxy-user=username-- Proxy-password=password-e "https_proxy=http://2.19.16.5:5718" https://proxy.mimvp.com/test_proxy2.php # Curl Support Socks curl-m--retry 3-u username:password--socks5 21.59.126.22:57216 http://proxy.mimvp.com/test_proxy2.php # HTTP curl-m--retry 3-u username:password--socks5 21.59.126.22:57216 https://proxy.mimvp.com/test_proxy2.php # HTTPS curl-m--retry 3--proxy-user Username:password-- SOCKS5 21.59.126.22:57216 http://proxy.mimvp.com/test_proxy2.php # http curl-m--retry 3--proxy-user username:passwo

 Rd--SOCKS5 21.59.126.22:57216 https://proxy.mimvp.com/test_proxy2.php # HTTPS # wget does not support socks



wget configuration File Settings Agent

Vim ~/.wgetrc

http_proxy=http://120.77.176.179:8888:8080
https_proxy=http://12.7.17.17:8888:8080
Use_proxy =
on wait =

# profile settings, immediately take effect, directly execute wget crawl command can
wget-t--tries 3 http://proxy.mimvp.com/test_ proxy2.php
wget-t--tries 3 https://proxy.mimvp.com/test_proxy2.php


Shell Set temporary local agent

# Proxy No auth
export http_proxy=http://120.77.176.179:8888:8080
export https_proxy=http:// 12.7.17.17:8888:8080

# Proxy auth (proxy requires username and password Authentication)
Export http_proxy=http://username:password@120.77.176.179 : 8888:8080
export https_proxy=http://username:password@12.7.17.17:8888:8080

# Direct Crawl page
curl-m--retry 3 http://proxy.mimvp.com/test_proxy2.php			# http_proxy
curl-m--retry 3 https://proxy.mimvp.com/test_ proxy2.php		# https_proxy
wget-t--tries 3 http://proxy.mimvp.com/test_proxy2.php			# http_proxy
Wget-t--tries 3 https://proxy.mimvp.com/test_proxy2.php		# https_proxy

# Cancel settings
unset http_proxy
unset Https_proxy


Shell Setup System Global Agent

# Modify/etc/profile, save and restart server
sudo vim/etc/profile		# Everyone valid
or
sudo vim ~/.BASHRC			# Everyone effective
or
Vim ~/.bash_profile			# Personal Effective #
	
	
at the end of the file, add the following
# Proxy no auth
export http_proxy=http:// 120.77.176.179:8888:8080
Export https_proxy=http://12.7.17.17:8888:8080

# proxy auth (agent requires user name and password Authentication)
Export http_proxy=http://username:password@120.77.176.179:8888:8080
export https_proxy=http:// username:password@12.7.17.17:8888:8080


# Executes the source command to make the configuration file take effect (temporarily)
source/etc/profile
or
SOURCE ~/.BASHRC
or
source ~/.bash_profile


# # If you need a machine to take effect permanently, you will need to reboot the server
sudo reboot
		


Meter Flutter Agent Sample

M-flutter agent, focusing on providing enterprises with large domestic data research services, technical team from Baidu, Millet, Ali, innovation workshops, for domestic enterprises to provide large data collection, data modeling analysis, the results of export display services.

The M-Flutter agent sample contains more than 10 programming languages or scripts, including Python, Java, PHP, C #, go, Perl, Ruby, Shell, Nodejs, Phantomjs, Groovy, Delphi, and easy language, through a large number of operational instances, The use of proxy IP is the correct way to facilitate web crawling, data collection, automated testing and other fields.


Meter Flutter Agent Example official website:

https://proxy.mimvp.com/demo2.php


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.