Picture of the national flag of the world under the Java crawler

Source: Internet
Author: User
Tags http post

Introduced

?? This blog will continue on a blog: Python crawler uses Fiddler+postman+python's requests module to crawl national flag content, will use Java to implement this crawler, download the world's national flag pictures. The project is no longer too much to introduce, specifically can refer to the previous blog.
?? We put the name of the world's countries in a TXT file, each line of a country name, the file is located in the e-disk flag directory, the name is Countries.txt, some of the following:

Reptile Program

?? Our idea of this reptile is the same as the idea of the previous blog: first read the name of the country in the Countries.txt, the country name as the parameter, read the country's search page, and then find the country's national flag image of the search page, and realize the download. This search process, we can use the Java URL package in the Post method to implement, about the Post method of the request header and the request body, you can use the Fiddler tool for Packet capture analysis.
?? The specific structure of the Java project is as follows:

The third-party API used is Commons-io and jsoup, the main function is Country_flag_download.java, and its Java code is complete as follows:

Package Wikiscrape;import Java.io.bufferedreader;import Java.io.file;import java.io.filereader;import Java.io.ioexception;import Java.io.outputstreamwriter;import Java.net.httpurlconnection;import Java.net.URL; Import Java.net.urlconnection;import java.util.arraylist;import org.jsoup.jsoup;import org.jsoup.nodes.Document; Import Org.jsoup.nodes.element;import Org.apache.commons.io.fileutils;public class Country_flag_download {public        static void Main (string[] args) {String fileName = "E://flag/countries.txt";                Read the country name in the Countries.txt file, stored in ArrayList arraylist<string> countries = readfilebylines (fileName); for (string country:countries) {String page = DoPost (country);//Gets the page where the country is located if (Page.indexof ("H            tml ") >= 0) {//Get Success GetContent (page); Download the national flag of the Country}} System.out.println ("Flag Download complete!"            "); */* Send an HTTP POST request to get the web address of the specified country * Incoming parameters: Country (country): STring Type */public static string DoPost (String country) {String url = "http://country.911cha.com/";            try {//Set URL, open connection url obj = new url (URL);                        HttpURLConnection conn = (httpurlconnection) obj.openconnection ();            Set the POST request header and the request body, the parameter of the request body is country (country) conn.setusecaches (false);            Conn.setrequestmethod ("POST"); String user_agent = "mozilla/5.0 (Windows NT 6.1;            WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/62.0.3202.94 safari/537.36 ";            Conn.setrequestproperty ("User-agent", user_agent); Conn.setrequestproperty ("Content-type", "application/x-www-form-urlencoded;            Charset=utf-8 ");            String postparams = String.Format ("q=%s", country);            The parameter Conn.setdooutput (TRUE) of the incoming POST request body;            OutputStreamWriter OS = new OutputStreamWriter (Conn.getoutputstream (), "UTF-8");            Os.write (Postparams);            Os.flush (); Os.closE ();            Get response result status code int responsecode = Conn.getresponsecode ();  if (Responsecode = = HTTPURLCONNECTION.HTTP_OK) {//If the response status code is 200//parsing HTML content into UTF-8 format Document                doc = Jsoup.parse (Conn.getinputstream (), "utf-8", url); Swipe to select the desired page content String page = doc.select ("Div.mcon"). Get (1). Selectfirs                                         T ("ul"). Selectfirst ("Li"). Selectfirst ("a")                . attr ("href");            return page;                } else {//If the response status code is not 200, return "Get page failed!"            Return "Get page failed.!";        }} catch (Exception e) {return "Get page failed.";  }}//GetContent () function is mainly implemented to download the flag of the specified country public static void GetContent (String page) {string base_url        = "http://country.911cha.com/"; String url = base_url+page;            try{//Use URL to resolve URL urlobj = new URL (URL); URL connection URLConnection Urlcon = Urlobj.openconnection ();            Open URL connection//parse HTML content into UTF-8 format Document doc = Jsoup.parse (Urlcon.getinputstream (), "utf-8", url);            Swipe to select the desired web content element image = Doc.selectfirst ("img");            String flag_name = image.attr ("alt"). Replace ("flag", "" ");                        String Flag_url = image.attr ("src");            URL httpurl = new URL (base_url+ '/' +flag_url);                        Use Fileutils.copyurltofile () to implement picture download Fileutils.copyurltofile (Httpurl, New File ("e://flag/" +flag_name+ ". gif"));                    System.out.println (String.Format ("%s flag download Successful ~", flag_name));            } catch (Exception e) {e.printstacktrace (); System.out.println ("Download failed!                    "); }}//To read the file in line, return ArrayList, the element inside is the name of each country public static ARRAYLIST&LT          string> readfilebylines (String filename) {File File = new file (fileName);  BufferedReader reader = null;        Set reader to null arraylist<string> countries = new arraylist<string> ();              try {reader = new BufferedReader (new FileReader (file));                          String tempstring = null; Reads one line at a time until NULL is read into the file end while ((tempstring = Reader.readline ()) = null) Countries.add (TEMPSTR ing); Add a country name to the list reader.close ();        Close reader return countries;          } catch (IOException e) {return countries;                  } finally {if (reader! = null) {try {reader.close ();                } catch (IOException E1) {e1.printstacktrace (); }              }          }      }     }
Run results

?? Click to run the Java program, you can find in the e-disk flag directory has downloaded the world's national flag pictures, see as follows:

?? Bingo, our Java Crawler program is running successfully! The main goal of this crawler is to implement the Post method in Java similar to the requests module in Python ~

Note: I have now opened two public number: because Python (number: Python_math) and easy to learn the Python crawler (number: Easy_web_scrape), welcome to the attention OH ~ ~

Picture of the national flag of the world under the Java crawler

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.