Jsoup parses html/obtains forum post information based on keywords

Source: Internet
Author: User
Taking the Forum broadband Mountain as an example, you need to obtain all posts about this keyword Based on the given keyword, including popularity, posting topic, reply count, publisher, posting time, and post link, detailed post text. The detailed code is as follows: Java code importjava... syntaxHighlighter. all (); Taking the Forum broadband Mountain as an example, you need to obtain all posts about this keyword Based on the given keyword, including popularity, posting topic, reply count, and publisher, the posting time, post link, detailed post text, and so on. The detailed code is as follows: Java code import java. util. arrayList; import java. util. hashMap; import java. util. list; import java. util. map; import org. jsoup. jsoup; import org. jsoup. nodes. document; import org. jsoup. nodes. element; import org. jsoup. select. elements; public class KeyWordsSearchUtil {/*** query the information required by the Forum based on the KeyWord map * @ param KeyWord input KeyWord * @ return */public static List > FindByKeyWord (String KeyWord) {List > PostsList = new ArrayList > (); Map PostsOneMap = null; try {Document doc = Jsoup. connect (" http://club.pchome.net/forum_1_15____md__1_ "+ Java.net. URLEncoder. encode (KeyWord, "UTF-8") + ". html "). data ("query", "Java "). userAgent ("Mozilla "). cookie ("auth", "token "). timeout (10000 ). ignoreHttpErrors (true ). post (); Elements postsLs = doc. select ("li. i2 "). not (". h-bg "); if (postsLs! = Null & postsLs. size ()> 0) {for (Element childPost: postsLs) {postsOneMap = new HashMap (); PostsOneMap. put ("postsPopularity", childPost. select ("li> span. n2 "). first (). text (); postsOneMap. put ("postsTitle", childPost. select ("span. n3> "). attr ("title"); postsOneMap. put ("postsFloor", childPost. select ("span. n4 "). first (). text (); postsOneMap. put ("postsCname", childPost. select (". bind_hover_card "). first (). text (); postsOneMap. put ("postsCtime", childPost. select ("li> span. n6 "). first (). text (); postsOneMap. put ("postsUrl "," http://club.pchome.net "+ ChildPost. select (" span. n3 a "). attr (" href "); postsOneMap. put (" postsContents ", getContentsByUrl (" http://club.pchome.net "+ ChildPost. select ("span. n3 "). attr ("href"); postsList. add (postsOneMap) ;}} catch (Exception e) {e. printStackTrace ();} return postsList ;} /*** obtain the post text content based on the Post url * @ param url the post path * @ return */public static String getContentsByUrl (String url) {String contents = "11"; try {Document doc = Jsoup. connect (url ). data ("query", "Java "). userAgent ("Mozilla "). cookie ("auth", "token "). timeout (10000 ). IgnoreHttpErrors (true). post (); if (doc. select ("div. mc"). first ()! = Null) {Element contentsEle = doc. select ("div. mc div "). first (); contents = contentsEle. select ("div "). first (). text (); if (contents. contains ("[left turn] [Right turn] [Source image]") {contents = contents. replace ("[Turn Left] [turn right] [Source image]", "") ;}} catch (Exception e) {e. printStackTrace ();} return contents;} public static void main (String [] args) throws Exception {List > PostsList = KeyWordsSearchUtil. findByKeyWord ("movie"); System. out. println ("http://club.pchome.net/forum_1_15____md__1_" + java.net. URLEncoder. encode ("movie", "UTF-8") + ". html "); System. out. println (postsList. size () + "//"); for (int I = 0; I <postsList. size (); I ++) {for (Map. entry Entry: postsList. get (I ). entrySet () {System. out. println ("key =" + entry. getKey () + "| value =" + entry. getValue ();} System. out. println ("-----------------");} // http://club.pchome.net/thread_1_15_7519679.html // String str = getContentsByUrl ("http://club.pchome.net/thread_1_15_7519679.html"); // System. out. println (str) ;}} the above Code can successfully capture the list of posts related to movies in the broadband Mountain Forum. The main method has been tested and can pass the test if the network is smooth. However, the above Code is only used to complete functions and has poor performance. The project needs to be rewritten or optimized.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.