Android App Development-Little Witch CSDN blog Clientjsoup Chapter
two weeks from the previous blog, the wizard also thought that I am sorry, because in busy to do another project, almost no empty, this wizard will leave the rest of the blog in the National Day to fill up. This blog will show you how to use the Jsoup library to parse our pages, and how to analyze the pages we want to parse. Jsoup of this library: Http://jsoup.org/download I use the jsoup-1.7.2 here after downloading, you can copy to the Libs folder in your project: Jsoup information is relatively small, can be used for reference to their official website to learn the use of this library: http://www.open-open.com/jsoup/ API Lookup Address: http://jsoup.org/apidocs/
the use of this library is not very familiar with the author is just a simple reference to the document and finished the parsing work, the following side of the author's analytic code is only to provide references, specific analytical methods, please carefully review the API documentation. How to use Jsoup this library is not the focus, the focus is how to analyze the page we want to parse, how to do the following effect:
here can see the homepage is to get the author blog post list, each item has the corresponding title, blog summary, published time, reading, comments number. This is the content that is obtained by parsing the HTML page of the homepage.
OK, this is an HTML page, do you want to move it to the mobile phone, just want you to learn how to analyze the HTML page, you can put whatever you want to move down, only the page you crawl did not do anti-stripping measures. I use the browser is Google Chrome, as a professional it people, without Chrome really said just go, press F12, you can see the following scene, feel excited ah, to find the treasure you want. Yes, the tease can look down:false assumptions to parse a Web page, or you have to look at it in this way, find what you want. The wizard does this by right-clicking on the element and looking directly at the corresponding HTML source code, so you know what the tag is for the content. Since the witch wanted to get a list of all the posts on the home page, I found the outermost div tag of the post, I found the ID of article_list, and then I found the contents of each blog entry, determine their specific tag, what class to use, Blogs can get the elements you want through class, and then get the content of the elements.
directly on the code bar:
/** * Parse HTML document using Jsoup * * @param blogtype * @param str * @return */public static list<blogitem> getblogitemlist (int Blogtype, String str) {//LOG.E ("URL---->", str); list<blogitem> list = new arraylist<blogitem> ();//Gets Document Object doc = Jsoup.parse (str);//LOG.E ("Doc--- > ", doc.tostring ());//Get all elements of class=" Article_item "Elements blogList = Doc.getelementsbyclass (" Article_item ");// LOG.E ("Elements--->", bloglist.tostring ()); for (Element blogitem:bloglist) {Blogitem item = new Blogitem (); String title = Blogitem.select ("h1"). Text (); Get title//System.out.println ("title----->" + title); String Description = Blogitem.select ("Div.article_description"). Text ();//System.out.println ("Descrition--->" + Description); String msg = Blogitem.select ("Div.article_manage"). Text ();//System.out.println ("MSG--->" + msg); String date = Blogitem.getelementsbyclass ("Article_manage"). Get (0). text ();//System.out.println ("Date--->" + date ); String link = blog_url+ blogitem.seLect ("H1"). Select ("a"). attr ("href");//System.out.println ("Link--->" + link); Item.settitle (title); Item.setmsg ( msg); item.setcontent (description); item.setdate (date); Item.setlink (link); Item.settype (blogtype);// No picture item.setimglink (null); List.add (item);} return list;}
The wizard obtains all the elements by class= "Article_item", that is, element, and then iterates through all the elements, taking out the values we need for each element. We can define an entity class, such as article item Blogitem, by creating a different Blogitem object, and then finally adding to the list, we can save the list of all the posts, the next time you fetch, directly through the list to fetch it.
we can see the use of Jsoup this library, just need so little code can easily get to the content we want, coding, efficiency, anything, and so on, to use it.
and then get the blog content is similar, given a URL, we can be the same way to parse the HTML code:
/** * Pick up the blog details of the incoming URL address * * @param URL * @param str * @return */public static list<blog> getcontent (String URL, stri ng str) {list<blog> List = new arraylist<blog> ();//Get document Contents doc = Jsoup.parse (str);//Get class= "details Elements element detail = Doc.getelementsbyclass ("Details"). Get (0);d etail.select ("script"). Remove (); Removes the DOM for each matching element. Gets the title element title = Detail.getelementsbyclass ("Article_title"). Get (0); Blog blogtitle = new blog (); Blogtitle.setstate (Constants.def_blog_item_type. TITLE); Set the status Blogtitle.setcontent (Todbc (Title.text ())); Set the title content//Get the article contents element content = Detail.select ("Div.article_content"). Get (0);//Get all elements labeled <a elements as = Detail.getelementsbytag ("a"); for (int b = 0; b < as.size (); b++) {Element blockquote = As.get (b);//Change the tag of this element. such as,<span> conversion to <div> such as El.tagname ("div");. Blockquote.tagname ("bold"); Convert to bold}elements SS = Detail.getelementsbytag ("strong"); for (int b = 0; b < ss.size (); b++) {Element blockquote = ss.ge T (b); BlockquOte.tagname ("bold");} Gets all the elements labeled <p elements PS = Detail.getelementsbytag ("P"); for (int b = 0; b < ps.size (); b++) {Element blockquote = Ps.get (b); Blockquote.tagname ("body"); Get all reference elements elements blockquotes = Detail.getelementsbytag ("blockquote"); for (int b = 0; b < blockquotes.size (); b++) {E Lement blockquote = Blockquotes.get (b); Blockquote.tagname ("Body");} Gets all the elements labeled <ul elements uls = Detail.getelementsbytag ("ul"); for (int b = 0; b < uls.size (); b++) {element Blockquot E = Uls.get (b); Blockquote.tagname ("Body");} Find bold Elements bs = Detail.getelementsbytag ("B"); for (int b = 0; b < bs.size (); b++) {Element bold = Bs.get (b); bold.ta Gname ("bold");} Iterate through all elements of the blog for (int j = 0; J < Content.children (). Size (); j + +) {element c = Content.child (j);//get each element//extract the picture if (C.select ("img"). Size () > 0) {Elements IMGs = C.getelementsbytag ("img"); SYSTEM.OUT.PRINTLN ("img"); for (Element Img:imgs) {if ("!img.attr (") ") {Blog Blogimgs = new blog ();//Large Map link if (!IMG.PARent (). attr ("href"). Equals ("")) {Blogimgs.setimglink (Img.parent (). attr ("href")); System.out.println ("href=" + img.parent (). attr ("href")); if (Img.parent (). Parent (). TagName (). Equals ("P")) {// Img.parent (). Parent (). remove (); Img.parent (). remove (); Blogimgs.setcontent (img.attr ("src")); Blogimgs.setimglink (img.attr ("src")); System.out.println (Blogimgs.getcontent ()); Blogimgs.setstate (Constants.def_blog_item_type. IMG); List.add (Blogimgs);}}} C.select ("img"). Remove ();//Get blog Content Blog blogcontent = new blog (); Blogcontent.setstate (Constants.def_blog_item_type. CONTENT), if (C.text (). Equals ("")) {continue;} else if (C.children (). Size () = = 1) {if (C.child (0). TagName (). Equals ("bold") | | c.child (0). TagName (). Equals ("span")) {if (C.owntext (). Equals ("")) {//Small title, Brown Blogcontent.setstate (Constants.def_blog_item_type. Bold_title);}}} Code if (C.select ("Pre"). attr ("name"). Equals ("code")) {blogcontent.setstate (constants.def_blog_item_type. CODE); Blogcontent.setcontent (Todbc (c.outerhtml ()));} else {blogcontent.setcontent (Todbc (c.outerhtml ()));} List.add (blogcontent);} return list;}
get a list of comments:
/** * Get Blog Comments list * * @param str * JSON string * @return */public static list<comment> getblogcommentlist (Strin G str, int pageindex,int pageSize) {list<comment> List = new arraylist<comment> (); try {//Create a JSON object Jsonobjec T jsonobject = new jsonobject (str); Jsonarray Jsonarray = Jsonobject.getjsonarray ("list"); Gets the JSON array int index = 0;int len = jsonarray.length (); Blogcommentactivity.commentcount = string.valueof (len); Number of comments//assuming that the number of comments is greater than 20if (Len >) {index = (PageIndex * pageSize)-20;} if (Len < pageSize && PageIndex > 1) {return list;} if ((PageIndex * pageSize) < len) {len = PageIndex * pageSize;} for (int i = index; i < Len; i++) {Jsonobject item = jsonarray.getjsonobject (i); String Commentid = item.getstring ("Commentid"); String content = item.getstring ("content"); String username = item.getstring ("username"); String parentid = item.getstring ("ParentID"); String posttime = item.getstring ("Posttime"); String userface = item.getstring ("Userface"); Comment Comment = new Comment (); Comment.setcommentid (Commentid); comment.setcontent (content); Comment.setusername ( username); Comment.setparentid (ParentID); Comment.setposttime (posttime); Comment.setuserface (userface); if ( Parentid.equals ("0")) {//Assuming ParentID is 0, indicates that it is a topiccomment.settype of comments (Constants.def_comment_type. PARENT);} else {comment.settype (constants.def_comment_type). Child);} List.add (comment);}} catch (Jsonexception e) {e.printstacktrace ();} return list;}
detailed use of the details of the author provides the source code: http://download.csdn.net/detail/wwj_748/7912513
The Wizard has the idea of parsing HTML to tell you, the rest of how to learn jsoup this library to parse the HTML on your own to do. Next blog trailer, integrating the social components of friends, specifically to introduce you how to integrate the social component SDK provided by the Friends of the league.
Android app Development-Little Witch CSDN blog Clientjsoup Chapter