The analytic HTML of the first knowledge jsoup

Source: Internet
Author: User

According to international practice, I should first introduce the next jsoup is what things, and then in the introduction of the specific usage, and then in a demo demo, in fact, I think so, small part of today spent a day of time from the study----parse the page, and finally the results are complete, ah ha, but, A program that doesn't summarize ape is not a handsome program ape, aha, that means I'm a handsome ape ape.

--------------------------------------------------------------------------------------------------------------- -------

First, what is Jsoup?

Official website: http://jsoup.org/

The corresponding jar can be downloaded on the website

Popular Will Jsoup is a parsing of the Web page, and then we look at the official explanation:

The official explanation is tall

Ii. Basic usage of jsoup (http://www.open-open.com/jsoup/parsing-a-document.htm)

Website written in very detailed, I would like to smart people look at the development of documents to understand ... Well, there is a reason, the so-called handsome people can read.

Third, demo demo analysis of the url:http://sex.guokr.com/

Write in front: ignore the link content, small is to find a good website ~, ah ha, don't think crooked

1. Parsing a Ul–>li

Let's look at the source code for this section:

So we know the general look, now let's write the code.

ImportOrg.jsoup.Jsoup;Importorg.jsoup.nodes.Document;Importorg.jsoup.nodes.Element;Importorg.jsoup.select.Elements;Importjava.io.IOException;/*** Use Jsoup to parse URL * @tag: URL:http://sex.guokr.com/* Created by Monster on 2015/12/11.*/ Public classJSOUPZX { Public Static voidMain (string[] args) {FinalString url= "http://sex.guokr.com/" ; Try{Document doc=jsoup.connect (URL). get (); Elements Container= Doc.getelementsbyclass ("Container"); Document Containerdoc=Jsoup.parse (container.tostring ()); Elements Module= Containerdoc.getelementsbyclass ("Module-list"); Document Moduledoc=Jsoup.parse (module.tostring ()); //Elements clearfix = Moduledoc.getelementsbyclass ("Clearfix"); //the form of the DOMElements Clearfix= Moduledoc.select (". Clearfix");//the form of the selector             for(Element clearfixli:clearfix) {Document Clearfixlidoc=Jsoup.parse (clearfixli.tostring ()); Elements Kind= Clearfixlidoc.select (". Board-tag");//the form of the selectorElements title = Clearfixlidoc.select (". Tit-post")); Elements author= Clearfixlidoc.select ("Span a"); System.out.println ("Category" +kind.text ());//categorySystem.out.println ("title" +title.text ());//titleSystem.out.println ("Author" +author.text ());//authorSYSTEM.OUT.PRINTLN ("Details link" +title.attr ("href"));//links under the headingSystem.out.println ("====================="); }              //String title = Clearfixli.getelementsbytag ("a"). Text (); //System.out.println (clearfix);        } Catch(IOException e) {e.printstacktrace (); }    }}

Results:

=================================================================================================

2. Parse the details page and comments

Links: http://sex.guokr.com/post/1100992/

The above is the content of the page

Then we look at the source code:

Content:

Comments:

After reading the source code, we encode:

Import Org.jsoup.Jsoup; Import org.jsoup.nodes.Document; Import org.jsoup.nodes.Element; Import org.jsoup.select.Elements; Import java.io.IOException; /**
* Use Jsoup to parse post details and comments
* @tag: url:http://sex.guokr.com/post/1100992/* Created by Monster on 2015/12/11.*/ Public classJsoupdetail { Public Static voidMain (String args[]) {FinalString url= "http://sex.guokr.com/post/1100992/"; Try{Document doc=jsoup.connect (URL). get (); Elements Container= Doc.getelementsbyclass ("Container"); Document Containerdoc=Jsoup.parse (container.tostring ()); String ArticleTitle= Containerdoc.getelementbyid ("ArticleTitle"). text (); String AuthorName= Containerdoc.getelementbyid ("AuthorName"). text (); String Time= Containerdoc.select ("span"). First (). text (); String Imgphotourl=containerdoc.select ("img"). Get (1). attr ("src"); System.out.println ("Title:" + ArticleTitle);//titleSystem.out.println ("+authorname");//authorSystem.out.println ("Release Time:" +time);//Release TimeSystem.out.println ("The URL of the author's Avatar:" +imgphotourl);//Release TimeElement articlecontent= Containerdoc.getelementbyid ("Articlecontent"); Document Articlecontentdoc=Jsoup.parse (articlecontent.tostring ()); intSize= articlecontentdoc.select ("P"). Size (); System.out.println ("Number of paragraphs:" +size); System.out.println ("Post content:");  for(inti=0;i<size;i++) {String content= Articlecontentdoc.select ("P"). Get (i). text ();            SYSTEM.OUT.PRINTLN (content); } System.out.println ("================================================"); System.out.println ("Post Comment area (according to floor distribution)"); Elements CMTs=containerdoc.getelementsbyclass ("CMTs"); Document Cmtsdoc=Jsoup.parse (cmts.tostring ()); System.out.println ("Comment Floor:" +cmtsdoc.select ("span"). First (). text ()); Elements cmtslist=cmtsdoc.getelementsbyclass ("Cmts-list");  for(Element clearfix:cmtslist) {String user= Clearfix.select ("a"). Get (1). text (); String Userphotourl=clearfix.select ("img"). Get (0). attr ("src"); String Replytime= Clearfix.select ("a"). Get (3). text (); String Floor=clearfix.select ("span"). text (); System.out.println ("Reviewer:" +user+ "\ n" + "reviewer picture url:" +userphotourl+ "\ n" + "Reply time:" +replytime+ "\ n" + "floor:" +Floor ); Document Replycontentdoc=Jsoup.parse (clearfix.tostring ()); Elements replycontent= Replycontentdoc.getelementsbyclass ("Cmt-content"); System.out.println ("Comment Content:"); intS =replycontent.select ("P"). Size ();  for(intj=0;j<s;j++) {String replycontent= Replycontent.select ("P"). Get (J). text ();               System.out.println (replycontent); } System.out.println ("================================================"); }        } Catch(IOException e) {e.printstacktrace (); }    }}

Output Result:

--------->

The above is a small series of demo, write a little simple, hope to understand, ah ha ~

In addition: Welcome to the blog of Small series, The DA

The analytic HTML of the first knowledge jsoup

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.