How does jsoup crawl images to a local device? Does jsoup crawl images?
Due to project requirements, vehicle brand information and vehicle system information were required. jsoup crawled website information in one day yesterday. The project is written using maven + spring + springmvc + mybatis.
Jsoup Development Guide
This is the address https://car.autohome.com.cn/zhaoche/pinpai/ that needs to crawl the website
1. First add the dependency in pom. xml
Because the image needs to be saved locally, the commons-net package is added.
<!-- https://mvnrepository.com/artifact/org.jsoup/jsoup --> <dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>1.10.3</version> </dependency><!-- https://mvnrepository.com/artifact/commons-net/commons-net --> <dependency> <groupId>commons-net</groupId> <artifactId>commons-net</artifactId> <version>3.3</version> </dependency>
2. crawler code implementation
@ Controller @ RequestMapping ("/car/") public class CarController {// image storage path private static final String saveImgPath = "C: // imgs "; /*** @ Title: insert brand name and image crawling and adding * @ Description: * @ param @ throws IOException * @ return void * @ throws * @ date 4:42:57 on January 1, January 29, 2018 */@ RequestMapping ("add") public void insert () throws IOException {// defines the address of the String url = "https://car.autohome.com.cn/zhaoche/pinpai "; // Obtain the webpage text Document doc = Jsoup. connect (url ). get (); // obtain the text content Elements elementsByClass = doc based on the class name. getElementsByClass ("uibox-con"); // traverses the set of classes for (Element element: elementsByClass) {// gets the number of subtags of the class int childNodeSize_1 = element. childNodeSize (); // cyclically obtain the content in the sub-label for (int I = 0; I <childNodeSize_1; I ++) {// obtain the logo image address String tupian = element. child (I ). child (0 ). child (0 ). child (0 ). child (0 ). attr ("src"); // get the brand name Str Ing pinpai = element. child (I ). child (0 ). child (1 ). text (); // output the obtained content to check whether the System is correct. out. println ("logo image address -----------" + tupian); System. out. println ("brand -----------" + pinpai); System. out. println (); // Save the logo image to the local String tupian_1 = "http:" + tupian; // connect url URL url1 = new URL (tupian_1 ); URLConnection uri = url1.openConnection (); // obtain the data stream InputStream is = uri. getInputStream (); // get the suffix String imageName = tupian. substri Ng (tupian. lastIndexOf ("/") + 1, tupian. length (); // write data stream OutputStream OS = new FileOutputStream (new File (saveImgPath, imageName); byte [] buf = new byte [1024]; int p = 0; while (p = is. read (buf ))! =-1) {OS. write (buf, 0, p );} /*** there are multiple joint venture factories under each brand * For example, FAW-Volkswagen and Shanghai Volkswagen and imported Volkswagen * all the names of the joint venture factories and Their * car series * // get the car series number of int childNodeSize_2 = element. child (I ). child (1 ). child (0 ). childNodeSize ();/*** obtain the number of sub-tags under the tag * if it is equal to 1, no other joint venture factory */int childNodeSize_3 = element. child (I ). child (1 ). childNodeSize (); if (childNodeSize_3 = 1) {// obtain the vehicle information cyclically for (int j = 0; j <childNodeSize_2; j ++) {String chexi = element. child (I ). child (1 ). child (0 ). child (j ). child (0 ). child (0 ). text (); System. out. println ("Car -----------" + chexi );}} else {/*** if childNodeSize_3 is greater than 1 *, there are multiple joint venture factories * // obtain the vehicle series of each joint venture factory for (int j = 0; j <childNodeSize_3; j ++) {int childNodeSize_4 = element. child (I ). child (1 ). child (j ). childNodeSize ();/*** if j is a singular number, it is the joint venture factory name * otherwise it is the vehicle information */int k = j % 2; if (k = 0) {// obtain the Joint Venture factory information String hezipinpai = element. child (I ). child (1 ). child (j ). child (0 ). text (); System. out. println ("Joint Venture name -----------" + hezipinpai);} else {// int childNodeSize_5 = element. child (I ). child (1 ). child (0 ). childNodeSize (); // cyclically obtain the vehicle information of the Joint Venture factory for (int l = 0; l <childNodeSize_4; l ++) {String chexi = element. child (I ). child (1 ). child (j ). child (l ). child (0 ). child (0 ). text (); System. out. println ("Car -----------" + chexi) ;}}} System. out. println ("************************"); System. out. println ("************************");}}}}
3. Running result
The above is all the content of this article. I hope it will be helpful for your learning and support for helping customers.