The analytic HTML of the first knowledge jsoup

Last Update:2015-12-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

According to international practice, I should first introduce the next jsoup is what things, and then in the introduction of the specific usage, and then in a demo demo, in fact, I think so, small part of today spent a day of time from the study----parse the page, and finally the results are complete, ah ha, but, A program that doesn't summarize ape is not a handsome program ape, aha, that means I'm a handsome ape ape.

--------------------------------------------------------------------------------------------------------------- -------

First, what is Jsoup?

Official website: http://jsoup.org/

The corresponding jar can be downloaded on the website

Popular Will Jsoup is a parsing of the Web page, and then we look at the official explanation:

The official explanation is tall

Ii. Basic usage of jsoup (http://www.open-open.com/jsoup/parsing-a-document.htm)

Website written in very detailed, I would like to smart people look at the development of documents to understand ... Well, there is a reason, the so-called handsome people can read.

Third, demo demo analysis of the url:http://sex.guokr.com/

Write in front: ignore the link content, small is to find a good website ~, ah ha, don't think crooked

1. Parsing a Ul–>li

Let's look at the source code for this section:

So we know the general look, now let's write the code.

ImportOrg.jsoup.Jsoup;Importorg.jsoup.nodes.Document;Importorg.jsoup.nodes.Element;Importorg.jsoup.select.Elements;Importjava.io.IOException;/*** Use Jsoup to parse URL * @tag: URL:http://sex.guokr.com/* Created by Monster on 2015/12/11.*/ Public classJSOUPZX { Public Static voidMain (string[] args) {FinalString url= "http://sex.guokr.com/" ; Try{Document doc=jsoup.connect (URL). get (); Elements Container= Doc.getelementsbyclass ("Container"); Document Containerdoc=Jsoup.parse (container.tostring ()); Elements Module= Containerdoc.getelementsbyclass ("Module-list"); Document Moduledoc=Jsoup.parse (module.tostring ()); //Elements clearfix = Moduledoc.getelementsbyclass ("Clearfix"); //the form of the DOMElements Clearfix= Moduledoc.select (". Clearfix");//the form of the selector             for(Element clearfixli:clearfix) {Document Clearfixlidoc=Jsoup.parse (clearfixli.tostring ()); Elements Kind= Clearfixlidoc.select (". Board-tag");//the form of the selectorElements title = Clearfixlidoc.select (". Tit-post")); Elements author= Clearfixlidoc.select ("Span a"); System.out.println ("Category" +kind.text ());//categorySystem.out.println ("title" +title.text ());//titleSystem.out.println ("Author" +author.text ());//authorSYSTEM.OUT.PRINTLN ("Details link" +title.attr ("href"));//links under the headingSystem.out.println ("====================="); }              //String title = Clearfixli.getelementsbytag ("a"). Text (); //System.out.println (clearfix);        } Catch(IOException e) {e.printstacktrace (); }    }}

Results:

=================================================================================================

2. Parse the details page and comments

Links: http://sex.guokr.com/post/1100992/

The above is the content of the page

Then we look at the source code:

Content:

Comments:

After reading the source code, we encode:

Import Org.jsoup.Jsoup; Import org.jsoup.nodes.Document; Import org.jsoup.nodes.Element; Import org.jsoup.select.Elements; Import java.io.IOException; /**

* Use Jsoup to parse post details and comments

* @tag: url:http://sex.guokr.com/post/1100992/* Created by Monster on 2015/12/11.*/ Public classJsoupdetail { Public Static voidMain (String args[]) {FinalString url= "http://sex.guokr.com/post/1100992/"; Try{Document doc=jsoup.connect (URL). get (); Elements Container= Doc.getelementsbyclass ("Container"); Document Containerdoc=Jsoup.parse (container.tostring ()); String ArticleTitle= Containerdoc.getelementbyid ("ArticleTitle"). text (); String AuthorName= Containerdoc.getelementbyid ("AuthorName"). text (); String Time= Containerdoc.select ("span"). First (). text (); String Imgphotourl=containerdoc.select ("img"). Get (1). attr ("src"); System.out.println ("Title:" + ArticleTitle);//titleSystem.out.println ("+authorname");//authorSystem.out.println ("Release Time:" +time);//Release TimeSystem.out.println ("The URL of the author's Avatar:" +imgphotourl);//Release TimeElement articlecontent= Containerdoc.getelementbyid ("Articlecontent"); Document Articlecontentdoc=Jsoup.parse (articlecontent.tostring ()); intSize= articlecontentdoc.select ("P"). Size (); System.out.println ("Number of paragraphs:" +size); System.out.println ("Post content:");  for(inti=0;i<size;i++) {String content= Articlecontentdoc.select ("P"). Get (i). text ();            SYSTEM.OUT.PRINTLN (content); } System.out.println ("================================================"); System.out.println ("Post Comment area (according to floor distribution)"); Elements CMTs=containerdoc.getelementsbyclass ("CMTs"); Document Cmtsdoc=Jsoup.parse (cmts.tostring ()); System.out.println ("Comment Floor:" +cmtsdoc.select ("span"). First (). text ()); Elements cmtslist=cmtsdoc.getelementsbyclass ("Cmts-list");  for(Element clearfix:cmtslist) {String user= Clearfix.select ("a"). Get (1). text (); String Userphotourl=clearfix.select ("img"). Get (0). attr ("src"); String Replytime= Clearfix.select ("a"). Get (3). text (); String Floor=clearfix.select ("span"). text (); System.out.println ("Reviewer:" +user+ "\ n" + "reviewer picture url:" +userphotourl+ "\ n" + "Reply time:" +replytime+ "\ n" + "floor:" +Floor ); Document Replycontentdoc=Jsoup.parse (clearfix.tostring ()); Elements replycontent= Replycontentdoc.getelementsbyclass ("Cmt-content"); System.out.println ("Comment Content:"); intS =replycontent.select ("P"). Size ();  for(intj=0;j<s;j++) {String replycontent= Replycontent.select ("P"). Get (J). text ();               System.out.println (replycontent); } System.out.println ("================================================"); }        } Catch(IOException e) {e.printstacktrace (); }    }}

Output Result:

--------->

The above is a small series of demo, write a little simple, hope to understand, ah ha ~

In addition: Welcome to the blog of Small series, The DA

The analytic HTML of the first knowledge jsoup

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The analytic HTML of the first knowledge jsoup

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The analytic HTML of the first knowledge jsoup

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support