Android Application Development-Xiao Wu CSDN blog client Jsoup, androidjsoup

Last Update:2014-10-02 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Android Application Development-Xiao Wu CSDN blog client Jsoup, androidjsoup
Android Application Development-Xiao Wu CSDN blog client Jsoup
Two weeks have passed since the previous blog, and Xiao Wu was very sorry, because he was busy working on another project and almost couldn't leave it blank. This is not Xiao Wu will make up all the remaining blog posts on the National Day. This blog will show you how to use the Jsoup library to parse our webpage and analyze the webpage we want to parse. For the Jsoup Library: Example/
I am not very familiar with the use of this library. I just completed the parsing by referring to the document. Therefore, the author's parsing code below is just a reference. The specific parsing method is as follows, check the api documentation carefully. How to Use the Jsoup library is not the focus, but how to analyze the webpages we want to parse and achieve the following results:
Here, we can see that the homepage is a list of blog posts from the author. Each entry has a corresponding title, blog summary, release time, number of reads, and number of comments. These contents are obtained by parsing the homepage html page.

Well, this is an html page. Do you really want to move it to your mobile phone? As long as you learn how to analyze the html page, you can move everything you want, as long as the web page you crawl does not take anti-DDoS measures. The browser I use is Google Chrome. As a professional IT professional, you can see the following scenes without Chrome. I feel so excited when I press F12, you can find the treasure you want. Yes, you can look down on it: if you want to parse a webpage, you still have to look at it on your own and find the content you want. You can right-click the element to view the corresponding html source code, so that you can know what the tag corresponding to the content is. Since Xiao Wu wants to obtain a list of all the blog posts on the homepage, I find the div label at the outermost layer of the blog and start analysis. I find the id is article_list, then, I found the content of each blog post item, determined their specific labels, and used the class. The blog can use the class to obtain the elements you want, then obtain the element content.
Go directly to the Code:

/*** Use Jsoup to parse the html document ** @ param blogType * @ param str * @ return */public static List <BlogItem> getBlogItemList (int blogType, String str) {// Log. e ("URL ---->", str); List <BlogItem> list = new ArrayList <BlogItem> (); // obtain the Document Object Document doc = Jsoup. parse (str); // Log. e ("doc --->", doc. toString (); // obtain all Elements of class = "article_item" Elements blogList = doc. getElementsByClass ("article_item"); // Log. e ("elements --->", blogList. toString (); for (Element blogItem: blogList) {BlogItem item = new BlogItem (); String title = blogItem. select ("h1 "). text (); // get the title // System. out. println ("title ----->" + title); String description = blogItem. select ("div. article_description "). text (); // System. out. println ("descrition --->" + description); String msg = blogItem. select ("div. article_manage "). text (); // System. out. println ("msg --->" + msg); String date = blogItem. getElementsByClass ("article_manage "). get (0 ). text (); // System. out. println ("date --->" + date); String link = BLOG_URL + blogItem. select ("h1 "). select (""). attr ("href"); // System. out. println ("link --->" + link); item. setTitle (title); item. setMsg (msg); item. setContent (description); item. setDate (date); item. setLink (link); item. setType (blogType); // no image item. setImgLink (null); list. add (item) ;}return list ;}

Wu obtains all elements through class = "article_item", that is, elements, and traverses all elements to obtain the values required for each Element. We can define an object class, such as the article item BlogItem. By creating different BlogItem objects and adding them to the list, we can save the list of all blog posts as a drop-down list, next time, you can use list to retrieve the data.
We can see that using the Jsoup library, we only need so few code to easily get the content we want, encoding, efficiency, and other things. Use it.
It is similar to getting the details of a blog post. Given a url, We can parse the html code in the same way:

/*** Obtain the detailed content of the Blog that has passed in the url ** @ param url * @ param str * @ return */public static List <Blog> getContent (String url, string str) {List <Blog> list = new ArrayList <Blog> (); // obtain the Document content Document doc = Jsoup. parse (str); // obtain Element detail = doc of class = "details. getElementsByClass ("details "). get (0); detail. select ("script "). remove (); // Delete the DOM of each matching element. // Obtain the title Element title = detail. getElementsByClass ("article_title "). get (0); Blog blogTitle = new Blog (); blogTitle. setState (Constants. DEF_BLOG_ITEM_TYPE.TITLE); // sets the status blogTitle. setContent (ToDBC (title. text (); // set the title content // obtain the article content Element content = detail. select ("div. article_content "). get (0); // obtain all Elements whose labels are <a Elements as = detail. getElementsByTag ("a"); for (int B = 0; B <. size (); B ++) {Element blockquote = As. get (B); // change the flag of this element. For example, <span> convert to <div> el. tagName ("div ");. Blockquote. tagName ("bold"); // converted to bold} Elements ss = detail. getElementsByTag ("strong"); for (int B = 0; B <ss. size (); B ++) {Element blockquote = ss. get (B); blockquote. tagName ("bold");} // obtain all Elements whose labels are <p Elements ps = detail. getElementsByTag ("p"); for (int B = 0; B <ps. size (); B ++) {Element blockquote = ps. get (B); blockquote. tagName ("body");} // obtain all referenced Elements blockquotes = detail. getElementsByTag ("blockquo Te "); for (int B = 0; B <blockquotes. size (); B ++) {Element blockquote = blockquotes. get (B); blockquote. tagName ("body");} // obtain all Elements whose tags are <ul Elements uls = detail. getElementsByTag ("ul"); for (int B = 0; B <uls. size (); B ++) {Element blockquote = uls. get (B); blockquote. tagName ("body");} // find the bold Elements bs = detail. getElementsByTag ("B"); for (int B = 0; B <bs. size (); B ++) {Element bold = bs. get (B); bold. tagNa Me ("bold");} // traverse all elements in the blog content for (int j = 0; j <content. children (). size (); j ++) {Element c = content. child (j); // obtain each element // extract the image if (c. select ("img "). size ()> 0) {Elements imgs = c. getElementsByTag ("img"); System. out. println ("img"); for (Element img: imgs) {if (! Img. attr ("src"). equals ("") {Blog blogImgs = new Blog (); // if (! Img. parent (). attr ("href "). equals ("") {blogImgs. setImgLink (img. parent (). attr ("href"); System. out. println ("href =" + img. parent (). attr ("href"); if (img. parent (). parent (). tagName (). equals ("p") {// img. parent (). parent (). remove ();} img. parent (). remove ();} blogImgs. setContent (img. attr ("src"); blogImgs. setImgLink (img. attr ("src"); System. out. println (blogImgs. getContent (); blogImgs. setState (Constants. DEF_BLOG_ITEM_TYPE.IMG); list. add (blogImgs) ;}}} c. select ("img "). remove (); // get Blog content Blog blogContent = new Blog (); blogContent. setState (Constants. DEF_BLOG_ITEM_TYPE.CONTENT); if (c. text (). equals ("") {continue;} else if (c. children (). size () = 1) {if (c. child (0 ). tagName (). equals ("bold") | c. child (0 ). tagName (). equals ("span") {if (c. ownText (). equals ("") {// Title, brown blogContent. setState (Constants. DEF_BLOG_ITEM_TYPE.BOLD_TITLE) ;}}// code if (c. select ("pre "). attr ("name "). equals ("code") {blogContent. setState (Constants. DEF_BLOG_ITEM_TYPE.CODE); blogContent. setContent (ToDBC (c. outerHtml ();} else {blogContent. setContent (ToDBC (c. outerHtml ();} list. add (blogContent);} return list ;}

Get comment list:

/*** Get the blog Comment List ** @ param str * json String * @ return */public static List <Comment> getBlogCommentList (String str, int pageIndex, int pageSize) {List <Comment> list = new ArrayList <Comment> (); try {// create a json object JSONObject jsonObject = new JSONObject (str); JSONArray jsonArray = jsonObject. getJSONArray ("list"); // obtain the json array int index = 0; int len = jsonArray. length (); BlogCommentActivity. commentCount = String. valueOf (len); // number of comments // if the number of comments is greater than 20if (len> 20) {index = (pageIndex * pageSize)-20 ;} if (len <pageSize & pageIndex> 1) {return list;} if (pageIndex * pageSize) <len) {len = pageIndex * pageSize ;} for (int I = index; I <len; I ++) {JSONObject item = jsonArray. getJSONObject (I); String commentId = item. getString ("CommentId"); String content = item. getString ("Content"); String username = item. getString ("UserName"); String parentId = item. getString ("ParentId"); String postTime = item. getString ("PostTime"); String userface = item. getString ("Userface"); Comment comment = new Comment (); comment. setCommentId (commentId); comment. setContent (content); comment. setUsername (username); comment. setParentId (parentId); comment. setPostTime (postTime); comment. setUserface (userface); if (parentId. equals ("0") {// If parentId is 0, it indicates that it is the topiccomment of the comment. setType (Constants. DEF_COMMENT_TYPE.PARENT);} else {comment. setType (Constants. DEF_COMMENT_TYPE.CHILD);} list. add (comment) ;}} catch (JSONException e) {e. printStackTrace ();} return list ;}

Specific use details can refer to the author's source code: http://download.csdn.net/detail/wwj_748/7912513
Xiao Wu has already told everyone about the idea of parsing html, and you have to do it yourself to learn how to parse html in the jsoup library. The next blog tells you how to integrate the social component SDK provided by umeng.

Use Jsoup to obtain specific tag attribute values

Doc. select ("meta [name = description]"). get (0). attr ("content ")
This is roughly the case. You can study the JSOUP selector.
Www.cnblogs.com/..5.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Android Application Development-Xiao Wu CSDN blog client Jsoup, androidjsoup

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Android Application Development-Xiao Wu CSDN blog client Jsoup, androidjsoup

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support