We know that in a Linux environment, you can call curl to download Web pages.
But Curl has some high-level applications that require only a few lines of command, which may be quicker than writing more than one line of PHP, Python, and C + + programs.
Let's talk about curl usage from a problem-driven perspective.
1. Download page, save to file
Curl www.baidu.com
Will output the Web page data to a standard output terminal, if you want to save to a file, you need to use
-o/--output <file> Write output to <file> instead of stdout.
2. Download multiple pages in bulk
Use {} and [] to identify the pattern of bulk downloads (students who have learned the regular knowledge).
This also involves the problem of saving the file, requires a placeholder such as # # to identify, curl will be responsible for replacing it,
Specific as follows
Curl Http://{one,two}.site.com-o "File_#1.txt" or use several variables Like:curl http://{site,host}.host[1-5].com-o "# 1_#2 "
3.302 pages
Sometimes the download page will encounter 301, 302 pages, then need to continue to crawl the page, curl through
-l/--location
Note: Man curl has such a word.
If This option is used twice, the second would again disable location following.
Let curl continue to get the real page, if there are multiple jumps, you can use Max-redirs to control the maximum number of jumps
You can limit the amount of redirects to follow by using the--max-redirs option
Jump involves the change of URL, especially sometimes the URL will jump between different domain, then we need to get the URL of the final crawl page
You need to use the ${url_effective} in the-w/--write-out <format> parameters
Url_effective the URL that is fetched last. This is mostly meaningful if you've told Curl to follow Location:headers.
4. Network Monitoring information
If you need to get curl download information, such as return code, network transmission speed and other information, also need to use the above mentioned-w parameter, its specific parameters can refer to man curl
Mention a few of the ones I used.
stat_format= "%{http_code}:%{time_connect}:%{time_starttransfer}:%{time_total}" Stat_format=${stat_format} ":%{ Speed_download}:%{size_download}:%{size_request} "Stat_format=${stat_format}":%{url_effective} "-W $stat _ Format
5. Timeout control
-m/--max-time <seconds>
Maximum time in seconds so allow the whole operation to take.
--connect-timeout <seconds>
Maximum time in seconds so the connection to the server to take
6. Specify UA
-A or user-agent parameter designation, note the need to add "".
The Common UA has
' Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0) ', ' mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2) ', ' mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) ', ' mozilla/5.0 (Windows; U Windows NT 5.2) gecko/2008070208 firefox/3.0.1 ', ' opera/9.27 (Windows NT 5.2; U ZH-CN) ', ' opera/8.0 (Macintosh; PPC Mac OS X; U EN) ', ' mozilla/5.0 (Windows; U Windows NT 5.2) applewebkit/525.13 (khtml, like Gecko) chrome/0.2.149.27 safari/525.13 ', ' mozilla/5.0 (Windows; U Windows NT 5.2) applewebkit/525.13 (khtml, like Gecko) version/3.1 safari/525.13 ' mobile side ' mozilla/5.0 (IPhone 5; CPU iPhone os 7_1_2 like Mac os X) AppleWebKit '
Basic usage of the Curl command