Jsoup系列學習(2)-解析html檔案，

最後更新：2016-12-06 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

Jsoup系列學習(2)-解析html檔案，
解析html檔案

1、當我們通過發送http請求時，有時候返回結果是一個html格式字串，你需要從一個網站擷取和解析一個HTML文檔，並尋找其中的相關資料。你可以使用下面解決方案：

使用 Jsoup.connect(String url)方法:

        //發送請求        Document doc = Jsoup.connect("https://www.baidu.com/").get();        //擷取id號為kw的控制項        Element content = doc.getElementById("kw");        //輸出控制項所有屬性        System.out.println(content.attributes());

輸出結果 
id="kw" name="wd" class="s_ipt" value="" maxlength="255" autocomplete="off"

頁面原始碼顯示，他其實與輸出結果是一致的，這就好比是selenium自動化架構一樣，通過 driver.findElement(By.id("kw"));找到這個控制項，然後堆空間進行各種操作。

說明

connect(String url) 方法建立一個新的 Connection, 和 get() 取得和解析一個HTML檔案。如果從該URL擷取HTML時發生錯誤，便會拋出 IOException，應適當處理。

Connection 介面還提供一個方法鏈來解決特殊請求，具體如下：

Document doc = Jsoup.connect("http://example.com")  .data("query", "Java")  .userAgent("Mozilla")  .cookie("auth", "token")  .timeout(3000)  .post();

資料幫浦1、使用DOM方法來遍曆一個document對象

既然我們想要從html中擷取指定資料，那麼我們先得要找到該控制項，我們把它叫做Elements對象，然後才是擷取該控制項的某些值。

尋找元素
- getElementById(String id) 通過id號尋找
- getElementsByTag(String tag) 標籤名尋找
- getElementsByClass(String className) class名尋找
- getElementsByAttribute(String key) (and related methods)
- Element siblings: siblingElements(), firstElementSibling(), lastElementSibling();nextElementSibling(), previousElementSibling()
- Graph: parent(), children(), child(int index)

元素資料

attr(String key)擷取屬性attr(String key, String value)設定屬性
attributes()擷取所有屬性
id(), className() and classNames()
text()擷取常值內容text(String value) 設定常值內容
html()擷取元素內HTMLhtml(String value)設定元素內的HTML內容
outerHtml()擷取元素外HTML內容
data()擷取資料內容（例如：script和style標籤)
tag() and tagName()

操作HTML和文本

append(String html), prepend(String html)
appendText(String text), prependText(String text)
appendElement(String tagName), prependElement(String tagName)
html(String value)

樣本1：擷取控制項name值

1         //發送請求2         Document doc = Jsoup.connect("https://www.baidu.com/").get();3         //擷取class名為s_ipt所有element對象4         Elements element = doc.getElementsByClass("s_ipt");5         //擷取第一個element對象的name屬性值6         String value = element.get(0).attr("name");7         System.out.println(value);

樣本2：當沒有id或name的元素，只能通過相對路徑來進行尋找。

1         //發送請求2         Document doc = Jsoup.connect("https://www.baidu.com/").get();3         //擷取id為u1的第一個子項目4         Element element = doc.getElementById("u1").child(0);5         //擷取元素連結6         String value1 = element.attr("href");7         //擷取元素常值內容8         String value2 = element.text();

輸出結果：

http://news.baidu.com新聞

參考

1、jsoup學習總結：http://blog.csdn.net/u010814849/article/details/52526582

2、http://www.open-open.com/jsoup/load-document-from-url.htm

3、jsoup學習總結：http://www.cnblogs.com/tomcattd/archive/2013/01/02/2842137.html

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Jsoup系列學習(2)-解析html檔案，

聯繫我們

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support