How to parse Web page data in Android using Jsoup detail

Source: Internet
Author: User



Recently used Jsoup, feeling is quite simple, very convenient, easy to crawl the Web source code, analysis to obtain the various tags required.



In the past few days, I have been working on a small project of a music player. Among them, the use of JSOUP is to use JSOUP to obtain page data, obtain the song list of the web page, and load the link of the song, so as to realize the download of the song and the download of the lyrics. After doing it well, I will write a few blog posts and share them with everyone. This blog post mainly explains how to use jsoup in android to obtain web page data.



Specific examples are as follows:
Jsoup:
Http://jsoup.org/download
Jsoup Development Guide, jsoup Chinese user manual, Jsoup Chinese documents:
http://www.open-open.com/jsoup/



The Chinese documentation is very good, the description is very detailed, many of the web is a copy of the Chinese document posted blog post. I saw it at the beginning, but I always have no idea how to choose what the selector chooses. This question is not explained in many posts.



This blog post no longer describes the use of Jsoup such as API, the Chinese document is very clear, the main explanation of how to parse the Web page data. The main case of this blog post is to parse the list of songs in the Web page, and the address is: Http://music.baidu.com/top/new/?pst=shouyeTop
Web interface:



This page is the homepage of Baidu Music, contains the new song leaderboard, the project is to parse this page to get the song list.



First, Android implements the following interface:



Parse the page and then get the Web page data, load the list, and show it to the user.
Here's how it starts:
1 Song object classes


/** * August 15, 2015 15:51:26 * Blog Address: http://blog.csdn.net/u010156024 * Song object class */ Public  class SearchResult implements Serializable { Private Static Final LongSerialversionuid =0X00000001LPrivateString Musicname;PrivateString URL;PrivateString artist;PrivateString album; PublicStringgetartist() {returnArtist } Public void setartist(String artist) { This. artist = artist; } PublicStringGetmusicname() {returnMusicname; } Public void Setmusicname(String musicname) { This. musicname = Musicname; } PublicStringGETURL() {returnUrl } Public void SetUrl(String URL) { This. url = URL; } PublicStringGetalbum() {returnAlbum } Public void Setalbum(String album) { This. album = album; }}


Above is a Song object class, very simple javabean. Not much to say.



2 using Jsoup for page data parsing


/** * August 15, 2015 15:54:43 * Blog Address: http://blog.csdn.net/u010156024 * This type of completion function: There is a URL link to resolve the list of recommended songs * * Public  class songsrecommendation {//Http://music.baidu.com/top/new/?pst=shouyeTop Private Static FinalString URL ="Http://music.baidu.com"+"/top/new/?pst=shouyetop";Private StaticSongsrecommendation sinstance;/** * Callback interface, pass data to activity or fragment * very useful data transfer method */ PrivateOnrecommendationlistener Mlistener;PrivateExecutorservice Mthreadpool; Public StaticSongsrecommendationgetinstance() {if(Sinstance = =NULL) Sinstance =NewSongsrecommendation ();returnSinstance; }PrivateHandler Mhandler =NewHandler () {@SuppressWarnings("Unchecked")@Override  Public void Handlemessage(Message msg) {Switch(msg.what) { CaseConstants.success:if(Mlistener! =NULL) Mlistener. Onrecommend ((arraylist<searchresult>) msg.obj); Break; CaseConstants.failed:if(Mlistener! =NULL) Mlistener.onrecommend (NULL); Break; }        }    };@SuppressLint("Handlerleak")Private songsrecommendation() {//Create single-thread poolMthreadpool = Executors.newsinglethreadexecutor (); }/** * Set Callback interface Onrecommendationlistener class object Mlistener * * @param L * @return */  PublicSongsrecommendationSetlistener(Onrecommendationlistener L) {mlistener = l;return  This; }/** * Real Web page parsing method * Open a new thread in the thread pool to perform parsing, send a message after parsing is completed * Pass the result to the main thread */  Public void Get() {Mthreadpool.execute (NewRunnable () {@Override  Public void Run() {arraylist<searchresult> result = Getmusiclist ();if(Result = =NULL) {mhandler.sendemptymessage (constants.failed);return;            } mhandler.obtainmessage (constants.success, result). Sendtotarget ();    }        }); }PrivateArraylist<searchresult>getmusiclist() {Try{/** * The method call please refer to the official website * Description: Timeout Set the request time, should not be too short.             * Time is too short to cause an exception and cannot be obtained. */Document doc = jsoup. Connect (URL). useragent ("mozilla/5.0 (Windows NT 6.1; Win64; x64) applewebkit/537.36 "+"(khtml, like Gecko) chrome/42.0.2311.22 safari/537.36"). Timeout ( -* +). get ();//select as selector, please refer to the website descriptionElements songtitles = Doc.select ("Span.song-title"); Elements artists = Doc.select ("Span.author_list"); Arraylist<searchresult> searchresults =NewArraylist<searchresult> (); for(inti =0; I < songtitles.size (); i++) {SearchResult SearchResult =NewSearchResult (); Elements urls = songtitles.get (i). Getelementsbytag ("a"); Searchresult.seturl (Urls.get (0). attr ("href")); Searchresult.setmusicname (Urls.get (0). text ()); Elements artistelements = Artists.get (i). Getelementsbytag ("a"); Searchresult.setartist (Artistelements.get (0). text ()); Searchresult.setalbum ("The newest recommendation");            Searchresults.add (SearchResult); }returnSearchResults; }Catch(IOException e)        {E.printstacktrace (); }return NULL; }/** * Callback interface after fetching the data, set the data passing through the interface */  Public  interface onrecommendationlistener {  Public void Onrecommend(arraylist<searchresult> results); }}


Above this class is the implementation also to the Web page data parsing, after parsing completes, passes through the handler to the main thread, the main thread in the Handmessage method, through the callback interface, passes the data to the activity or fragment which invokes the class.



Most of the general blog posts so far, because the key parts have been explained to you. But what I'm trying to say is that when you use Jsoup for parsing,


Select is the selector, please refer to the official website Elements songtitles = doc. Select("Span.song-title");Elements artists = doc. Select("Span.author_list");arraylist<searchresult> searchresults = new Arraylist<searchresult> ();for (int i =0; i < songtitles.size (); i++) {SearchResult SearchResult = new SearchResult ();Elements URLs = songtitles. Get(i). Getelementsbytag("a");SearchResult. SetUrl(URLs. Get(0). attr("href"));SearchResult. Setmusicname(URLs. Get(0). Text());Elements artistelements = Artists. Get(i). Getelementsbytag("a");SearchResult. Setartist(artistelements. Get(0). Text());SearchResult. Setalbum("The newest recommendation");SearchResults. Add(SearchResult);}


This part of the code is most critical, and
Elements songtitles = Doc.select ("Span.song-title");
Elements artists = Doc.select ("Span.author_list");
How do you choose Span.song-title and span.author_list in these two lines of code? I didn't understand it at first, and now I'm going to show you how to make a choice. If you read this and you understand it, you don't have to look down. If you do not understand, please continue ...
First to the Web page http://music.baidu.com/top/new/?pst=shouyeTop, using the browser to view the source code:

By looking at the source code we know that the above selected Span.song-title, Span.author_list are in the source code.



Next we go to http://try.jsoup.org/Web page, to try to parse online, page:



The following input Http://music.baidu.com/top/new/?pst=shouyeTop online analysis to see the parsed data:






I believe that by the above two diagram, you should understand how to choose the selector in the code. According to the official website, the selector is very powerful and can make a comprehensive choice, greatly simplifying the complexity of getting data from the Web.



At this point, the basic completion of this blog to explain the content, if there are any errors, please enlighten us. If you feel you can, please leave a message, give a praise Bai ~ ~ ^_^ Thank you ~ "shake Hands"



Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.



How to parse Web page data in Android using Jsoup detail


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.