java中用jsoup抓取網頁源碼，並批量下載圖片

最後更新：2015-05-31 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

標籤：jsoup 網頁解析爬蟲批量下載圖片

一、匯入jsoup的核心jar包jsoup-xxx.jar

jar包：jsoup-1.8.2.jar

中文API地址：http://www.open-open.com/jsoup/parsing-a-document.htm二、java中用jsoup抓取網頁源碼，並批量下載圖片

package com.dgh.test;import java.io.File;import java.io.FileOutputStream;import java.io.IOException;import java.io.InputStream;import java.io.UnsupportedEncodingException;import java.net.HttpURLConnection;import java.net.URL;import java.net.URLEncoder;import org.jsoup.Jsoup;import org.jsoup.nodes.Document;import org.jsoup.nodes.Element;import org.jsoup.select.Elements;/** *  *  抓取網頁資源 *  @author wangcunhuazi *   */public class JsoupTest {//資源所在的網頁地址private static String resourceURL = "http://www.csdn.net/"; //資源下載之後，儲存在本地的檔案路徑private static String downloadFilePath = "E://downloadImage//";/** *  * 根據圖片的外網地址下載圖片到本地硬碟的filePath * @param filePath 本地儲存圖片的檔案路徑 * @param imgUrl 圖片的外網地址 * @throws UnsupportedEncodingException  *  */public static void downImages(String filePath,String imgUrl) throws UnsupportedEncodingException {//圖片url中的前面部分：例如"http://images.csdn.net/"String beforeUrl = imgUrl.substring(0,imgUrl.lastIndexOf("/")+1);//圖片url中的後面部分：例如“20150529/PP6A7429_副本1.jpg”String fileName = imgUrl.substring(imgUrl.lastIndexOf("/")+1);//編碼之後的fileName，空格會變成字元"+"String newFileName = URLEncoder.encode(fileName, "UTF-8");//把編碼之後的fileName中的字元"+"，替換為UTF-8中的空格表示："%20"newFileName = newFileName.replaceAll("\\+", "\\%20");//編碼之後的urlimgUrl = beforeUrl + newFileName;try {//建立檔案目錄File files = new File(filePath);if (!files.exists()) {files.mkdirs();}//擷取URL url = new URL(imgUrl);//連結網路地址HttpURLConnection connection = (HttpURLConnection)url.openConnection();//擷取連結的輸出資料流InputStream is = connection.getInputStream();//建立檔案，fileName為編碼之前的檔案名稱File file = new File(filePath + fileName);//根據輸入資料流寫入檔案FileOutputStream out = new FileOutputStream(file);int i = 0;while((i = is.read()) != -1){out.write(i);}out.close();is.close();} catch (Exception e) {e.printStackTrace();}}public static void main(String[] args) throws IOException {//從一個網站擷取和解析一個HTML文檔，jsoup的API中有此方法的說明Document document = Jsoup.connect(resourceURL).get();//System.out.println(document);//擷取所有的img標籤Elements elements = document.getElementsByTag("img");for(Element element : elements){//擷取每個img標籤的src屬性的內容，即圖片地址，加"abs:"表示絕對路徑String imgSrc = element.attr("abs:src");//下載圖片檔案到電腦的本地硬碟上System.out.println("正在下載圖片：-----------" + imgSrc);downImages(downloadFilePath, imgSrc);System.out.println("圖片下載完畢：-----------" + imgSrc);System.out.println("-------------------------------------------------------------------------------------------------------------");}System.out.println("共下載了 " + elements.size() +" 個檔案(不去重)");}}

更多jsoup使用方法的詳細說明: http://blog.csdn.net/wangcunhuazi/article/details/46237277

http://blog.csdn.net/wangcunhuazi/article/details/46237211

http://blog.csdn.net/wangcunhuazi/article/details/46237325

java中用jsoup抓取網頁源碼，並批量下載圖片

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

java中用jsoup抓取網頁源碼，並批量下載圖片

聯繫我們

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support