According to international practice, I should first introduce the next jsoup is what things, and then in the introduction of the specific usage, and then in a demo demo, in fact, I think so, small part of today spent a day of time from the study----parse the page, and finally the results are complete, ah ha, but, A program that doesn't summarize ape is not a handsome program ape, aha, that means I'm a handsome ape ape.
--------------------------------------------------------------------------------------------------------------- -------
First, what is Jsoup?
Official website: http://jsoup.org/
The corresponding jar can be downloaded on the website
Popular Will Jsoup is a parsing of the Web page, and then we look at the official explanation:
The official explanation is tall
Ii. Basic usage of jsoup (http://www.open-open.com/jsoup/parsing-a-document.htm)
Website written in very detailed, I would like to smart people look at the development of documents to understand ... Well, there is a reason, the so-called handsome people can read.
Third, demo demo analysis of the url:http://sex.guokr.com/
Write in front: ignore the link content, small is to find a good website ~, ah ha, don't think crooked
1. Parsing a Ul–>li
Let's look at the source code for this section:
So we know the general look, now let's write the code.
ImportOrg.jsoup.Jsoup;Importorg.jsoup.nodes.Document;Importorg.jsoup.nodes.Element;Importorg.jsoup.select.Elements;Importjava.io.IOException;/*** Use Jsoup to parse URL * @tag: URL:http://sex.guokr.com/* Created by Monster on 2015/12/11.*/ Public classJSOUPZX { Public Static voidMain (string[] args) {FinalString url= "http://sex.guokr.com/" ; Try{Document doc=jsoup.connect (URL). get (); Elements Container= Doc.getelementsbyclass ("Container"); Document Containerdoc=Jsoup.parse (container.tostring ()); Elements Module= Containerdoc.getelementsbyclass ("Module-list"); Document Moduledoc=Jsoup.parse (module.tostring ()); //Elements clearfix = Moduledoc.getelementsbyclass ("Clearfix"); //the form of the DOMElements Clearfix= Moduledoc.select (". Clearfix");//the form of the selector for(Element clearfixli:clearfix) {Document Clearfixlidoc=Jsoup.parse (clearfixli.tostring ()); Elements Kind= Clearfixlidoc.select (". Board-tag");//the form of the selectorElements title = Clearfixlidoc.select (". Tit-post")); Elements author= Clearfixlidoc.select ("Span a"); System.out.println ("Category" +kind.text ());//categorySystem.out.println ("title" +title.text ());//titleSystem.out.println ("Author" +author.text ());//authorSYSTEM.OUT.PRINTLN ("Details link" +title.attr ("href"));//links under the headingSystem.out.println ("====================="); } //String title = Clearfixli.getelementsbytag ("a"). Text (); //System.out.println (clearfix); } Catch(IOException e) {e.printstacktrace (); } }}
Results:
=================================================================================================
2. Parse the details page and comments
Links: http://sex.guokr.com/post/1100992/
The above is the content of the page
Then we look at the source code:
Content:
Comments:
After reading the source code, we encode:
Import Org.jsoup.Jsoup; Import org.jsoup.nodes.Document; Import org.jsoup.nodes.Element; Import org.jsoup.select.Elements; Import java.io.IOException; /**
* Use Jsoup to parse post details and comments
* @tag: url:http://sex.guokr.com/post/1100992/* Created by Monster on 2015/12/11.*/ Public classJsoupdetail { Public Static voidMain (String args[]) {FinalString url= "http://sex.guokr.com/post/1100992/"; Try{Document doc=jsoup.connect (URL). get (); Elements Container= Doc.getelementsbyclass ("Container"); Document Containerdoc=Jsoup.parse (container.tostring ()); String ArticleTitle= Containerdoc.getelementbyid ("ArticleTitle"). text (); String AuthorName= Containerdoc.getelementbyid ("AuthorName"). text (); String Time= Containerdoc.select ("span"). First (). text (); String Imgphotourl=containerdoc.select ("img"). Get (1). attr ("src"); System.out.println ("Title:" + ArticleTitle);//titleSystem.out.println ("+authorname");//authorSystem.out.println ("Release Time:" +time);//Release TimeSystem.out.println ("The URL of the author's Avatar:" +imgphotourl);//Release TimeElement articlecontent= Containerdoc.getelementbyid ("Articlecontent"); Document Articlecontentdoc=Jsoup.parse (articlecontent.tostring ()); intSize= articlecontentdoc.select ("P"). Size (); System.out.println ("Number of paragraphs:" +size); System.out.println ("Post content:"); for(inti=0;i<size;i++) {String content= Articlecontentdoc.select ("P"). Get (i). text (); SYSTEM.OUT.PRINTLN (content); } System.out.println ("================================================"); System.out.println ("Post Comment area (according to floor distribution)"); Elements CMTs=containerdoc.getelementsbyclass ("CMTs"); Document Cmtsdoc=Jsoup.parse (cmts.tostring ()); System.out.println ("Comment Floor:" +cmtsdoc.select ("span"). First (). text ()); Elements cmtslist=cmtsdoc.getelementsbyclass ("Cmts-list"); for(Element clearfix:cmtslist) {String user= Clearfix.select ("a"). Get (1). text (); String Userphotourl=clearfix.select ("img"). Get (0). attr ("src"); String Replytime= Clearfix.select ("a"). Get (3). text (); String Floor=clearfix.select ("span"). text (); System.out.println ("Reviewer:" +user+ "\ n" + "reviewer picture url:" +userphotourl+ "\ n" + "Reply time:" +replytime+ "\ n" + "floor:" +Floor ); Document Replycontentdoc=Jsoup.parse (clearfix.tostring ()); Elements replycontent= Replycontentdoc.getelementsbyclass ("Cmt-content"); System.out.println ("Comment Content:"); intS =replycontent.select ("P"). Size (); for(intj=0;j<s;j++) {String replycontent= Replycontent.select ("P"). Get (J). text (); System.out.println (replycontent); } System.out.println ("================================================"); } } Catch(IOException e) {e.printstacktrace (); } }}
Output Result:
--------->
The above is a small series of demo, write a little simple, hope to understand, ah ha ~
In addition: Welcome to the blog of Small series, The DA
The analytic HTML of the first knowledge jsoup