Android App Development-Little Witch CSDN Blog Client Jsoup Chapter
two weeks from the previous blog, the wizard also felt very sorry, because in busy to do another project, almost no empty, this wizard will leave the rest of the blog in the National Day to fill up. This blog will show you how to use the Jsoup library to parse our pages, and how to analyze the pages we want to parse. Jsoup of this library: Http://jsoup.org/download I use the jsoup-1.7.2 here after downloading, copy to the Libs directory in your project: Jsoup information is relatively small, can be used for reference to its official website to learn the use of this library: http://www.open-open.com/jsoup/ API Lookup Address: http://jsoup.org/apidocs/
the use of this library is not very familiar with the author is just a simple reference to the document and completed the parsing work, so the following author's analytic code is only to provide reference, specific analytic method, please carefully review the API documentation. How to use Jsoup this library is not the focus, the focus is how to analyze the page we want to parse, how to do the following effect:
Here you can see the homepage is a list of blog posts to get the author, each entry has a corresponding title, blog summary, publish time, reading number, comments number. This is the content that is obtained by parsing the HTML page of the homepage.
OK, this is an HTML page, do you want to move it to the mobile phone, as long as you learn how to analyze the HTML page, you can put any content you want to move down, as long as you crawl the page did not do anti-stripping measures. I use the browser is Google Chrome, as a professional it people, do not have Chrome really justified, press F12, you can see the following scenes, feel excited ah, you can find the treasure you want. Yes, the tease can look down:If you want to parse a webpage, you have to look at it this way and find what you want. The wizard does this by right-clicking on the element and looking directly at the corresponding HTML source code, so you know what the tag is for the content. Because the witch wanted to get a list of all the posts on the home page, so I found the outermost div tag of the blog to start analyzing, I found the ID for article_list, and then I found the contents of each blog entry, determine their specific tag, what class to use, A blog can use class to get the elements you want, and then get the content of the elements.
directly on the code bar:
/** * Parse HTML document using Jsoup * * @param blogtype * @param str * @return */public static list<blogitem> getblogitemlist (int Blogtype, String str) {//LOG.E ("URL---->", str); list<blogitem> list = new arraylist<blogitem> ();//Gets Document Object doc = Jsoup.parse (str);//LOG.E ("Doc--- > ", doc.tostring ());//Get all elements of class=" Article_item "Elements blogList = Doc.getelementsbyclass (" Article_item ");// LOG.E ("Elements--->", bloglist.tostring ()); for (Element blogitem:bloglist) {Blogitem item = new Blogitem (); String title = Blogitem.select ("h1"). Text (); Get title//System.out.println ("title----->" + title); String Description = Blogitem.select ("Div.article_description"). Text ();//System.out.println ("Descrition--->" + Description); String msg = Blogitem.select ("Div.article_manage"). Text ();//System.out.println ("MSG--->" + msg); String date = Blogitem.getelementsbyclass ("Article_manage"). Get (0). text ();//System.out.println ("Date--->" + date ); String link = blog_url+ blogitem.seLect ("H1"). Select ("a"). attr ("href");//System.out.println ("Link--->" + link); Item.settitle (title); Item.setmsg ( msg); item.setcontent (description); item.setdate (date); Item.setlink (link); Item.settype (blogtype);// No picture item.setimglink (null); List.add (item);} return list;}
by class= "Article_item", the wizard obtains all the elements, that is, element, and then iterates through all the elements, taking out the values we need for each element. We can define an entity class, such as the article item Blogitem, by creating a different Blogitem object, and then finally adding to the list, we can save the list of all the posts in the dropdown, the next time you fetch, directly through the list to fetch.
we can see the use of Jsoup this library, only need so little code can easily get to the content we want, coding, efficiency, anything, and so on, to use it.
then get the details of the post is similar, given a URL, we can be the same way to parse the HTML code:
/** * Extract the blog details of the incoming URL address * * @param URL * @param str * @return */public static list<blog> getcontent (String URL, stri ng str) {list<blog> List = new arraylist<blog> ();//Get document Contents doc = Jsoup.parse (str);//Get class= "details Elements element detail = Doc.getelementsbyclass ("Details"). Get (0);d etail.select ("script"). Remove (); Removes the DOM for each matching element. Gets the title element title = Detail.getelementsbyclass ("Article_title"). Get (0); Blog blogtitle = new blog (); Blogtitle.setstate (Constants.def_blog_item_type. TITLE); Set the status Blogtitle.setcontent (Todbc (Title.text ())); Set the title content//Get the article contents element content = Detail.select ("Div.article_content"). Get (0);//Get all elements labeled <a elements as = Detail.getelementsbytag ("a"); for (int b = 0; b < as.size (); b++) {Element blockquote = As.get (b);//Change the tag of this element. For example,<span> converted to <div> such as El.tagname ("div");. Blockquote.tagname ("bold"); Convert to bold}elements SS = Detail.getelementsbytag ("strong"); for (int b = 0; b < ss.size (); b++) {Element blockquote = ss.ge T (b); blockquoTe.tagname ("bold");} Gets all the elements labeled <p elements PS = Detail.getelementsbytag ("P"); for (int b = 0; b < ps.size (); b++) {Element blockquote = Ps.get (b); Blockquote.tagname ("body"); Gets all reference elements elements blockquotes = Detail.getelementsbytag ("blockquote"); for (int b = 0; b < blockquotes.size (); b++) {E Lement blockquote = Blockquotes.get (b); Blockquote.tagname ("Body");} Gets all the elements labeled <ul elements uls = Detail.getelementsbytag ("ul"); for (int b = 0; b < uls.size (); b++) {element Blockquot E = Uls.get (b); Blockquote.tagname ("Body");} Find bold Elements bs = Detail.getelementsbytag ("B"); for (int b = 0; b < bs.size (); b++) {Element bold = Bs.get (b); bold.ta Gname ("bold");} Iterate through all elements of the blog for (int j = 0; J < Content.children (). Size (); j + +) {element c = Content.child (j);//get each element//extract the picture if ( C.select ("img"). Size () > 0) {Elements IMGs = C.getelementsbytag ("img"); SYSTEM.OUT.PRINTLN ("img"); for (Element Img:imgs) {if ("!img.attr (") ") {Blog Blogimgs = new blog ();//Large Map link if (!img.pareNT (). attr ("href"). Equals ("")) {Blogimgs.setimglink (Img.parent (). attr ("href")); System.out.println ("href=" + img.parent (). attr ("href")); if (Img.parent (). Parent (). TagName (). Equals ("P")) {// Img.parent (). Parent (). remove (); Img.parent (). remove (); Blogimgs.setcontent (img.attr ("src")); Blogimgs.setimglink (img.attr ("src")); System.out.println (Blogimgs.getcontent ()); Blogimgs.setstate (Constants.def_blog_item_type. IMG); List.add (Blogimgs);}}} C.select ("img"). Remove ();//Get blog Content Blog blogcontent = new blog (); Blogcontent.setstate (Constants.def_blog_item_type. CONTENT), if (C.text (). Equals ("")) {continue;} else if (C.children (). Size () = = 1) {if (C.child (0). TagName (). Equals ("bold") | | c.child (0). TagName (). Equals ("span")) {if (C.owntext (). Equals ("")) {//Small title, Brown Blogcontent.setstate (Constants.def_blog_item_type. Bold_title);}}} Code if (C.select ("Pre"). attr ("name"). Equals ("code")) {blogcontent.setstate (constants.def_blog_item_type. CODE); Blogcontent.setcontent (Todbc (c.outerhtml ()));} else {blogcontent.setcontent (toDBC (c.outerhtml ()));} List.add (blogcontent);} return list;}
get a list of comments:
/** * Get Blog Comments list * * @param str * JSON string * @return */public static list<comment> getblogcommentlist (Strin G str, int pageindex,int pageSize) {list<comment> List = new arraylist<comment> (); try {//Create a JSON object Jsonobjec T jsonobject = new jsonobject (str); Jsonarray Jsonarray = Jsonobject.getjsonarray ("list"); Gets the JSON array int index = 0;int len = jsonarray.length (); Blogcommentactivity.commentcount = string.valueof (len); Number of reviews//If the number of comments is greater than 20if (Len >) {index = (PageIndex * pageSize)-20;} if (Len < pageSize && PageIndex > 1) {return list;} if ((PageIndex * pageSize) < len) {len = PageIndex * pageSize;} for (int i = index; i < Len; i++) {Jsonobject item = jsonarray.getjsonobject (i); String Commentid = item.getstring ("Commentid"); String content = item.getstring ("content"); String username = item.getstring ("username"); String parentid = item.getstring ("ParentID"); String posttime = item.getstring ("Posttime"); String userface = item.getstring ("Userface"); Comment Comment = new Comment (); Comment.setcommentid (Commentid); comment.setcontent (content); Comment.setusername ( username); Comment.setparentid (ParentID); Comment.setposttime (posttime); Comment.setuserface (userface); if ( Parentid.equals ("0")) {//If ParentID is 0, it is a comment topiccomment.settype (constants.def_comment_type. PARENT);} else {comment.settype (constants.def_comment_type). Child);} List.add (comment);}} catch (Jsonexception e) {e.printstacktrace ();} return list;}
specific use of the details can be referred to the author's source: http://download.csdn.net/detail/wwj_748/7912513
The Wizard has the idea of parsing HTML to tell you, the rest of how to learn jsoup this library to parse the HTML on your own to do. The next blog trailer, integrating the social components of friends, details how to integrate the social component SDK provided by the Friends Alliance.
Android app Development-Little Witch CSDN Blog Client Jsoup Chapter