"Android My Blog app" 1. Crawl Blog Home page article list content--Web data crawl

Source: Internet
Author: User

Plan to do a blog app in the blog, first to be able to access the home page to get data to get home page of the article list, the first step to crawl the blog home page article list the function has been realized, on the millet 2S is as follows:

The idea is: through the writing of the tool class to access the Web page, get the page source code, through the regular expression to get matching data for processing display to the ListView

Briefly explain the following points:
1. Use the Apache HttpClient Library to implement get requests.
2. Asynchronous request processing.
3. Regular expressions fetch the data they need.

Use the Apache HttpClient Library to implement get requests.

Three simple steps to using Apache

New Defaulthttpclient ();  // Create a httpclient        new HttpGet ("http://///Create a GET request       // send a GET request and respond to content

Asynchronous request Processing

The implementation of the asynchronous request is also simple, opening a new thread to perform the request processing, and the request completes the data obtained by handler in the main thread processing. See the Mainactivity.java Class code specifically.

Regular expressions grab the data they need

Visit my Blog home page to view the source code, it is easy to find the format we want to crawl the content of the article list is as follows:

<Divclass= "Posttitle">                <aID= "Homepage1_homepagedays_dayslist_ctl00_daylist_titleurl_0"class= "PostTitle2"href= "http://www.cnblogs.com/yc-755909659/p/4187155.html">"The zero start of Android game Programming" 19. Basics of game development (game music and sound effects)</a>            </Div>            <Divclass= "Postcon"><Divclass= "C_b_p_desc">Absrtact: In a game, in addition to the gorgeous interface UI directly to attract players, but also important is the game's background music and sound effects, the right background music and wonderful sound matching will make the whole game up a notch. In Android. The class commonly used to play the game's background music is MediaPlayer, while the Soundpool class is used for game sound. 1. MediaPlayer ...<ahref= "http://www.cnblogs.com/yc-755909659/p/4187155.html"class= "C_b_p_desc_readmore">Read the full text</a></Div></Div>            <Divclass= "Clear"></Div>            <Divclass= "Postdesc">Posted @ 2014-12-30 12:16 y know from Read (45) Comments (0)<ahref= "http://i.cnblogs.com/EditPosts.aspx?postid=4187155"rel= "nofollow">Edit</a></Div>            <Divclass= "Clear"></Div>

So, get the regular expression as follows:

"Class=\" posttitle2\ "href=\" (. *?) \ "> (. *?) </a>.* Summary: (. *?) <a.*?posted @ (. *?) Y know from Read (. *?) Comments (. *?) <a ";

And then get the matching data through regular expressions, get the data you need.

/*** Online access to data * *@returnData*/     Public StaticList<bloglistinfo>getblognetdate (string path, string regex) {List<BlogListInfo> result =NewArraylist<bloglistinfo>(); String blogstring=Removern (Http_get (path)); Pattern P=pattern.compile (regex); //the source code string for my blog home pageMatcher m =P.matcher (blogstring);  while(M.find ()) {//loop Find matching stringMatchresult Mr =M.tomatchresult (); Bloglistinfo Info=NewBloglistinfo (); Info.setblogurl (Mr.group (1)); Info.setblogtitle (Mr.group (2)); Info.setblogsummary (Mr.group (3)); Info.setblogtime (Mr.group (4)); Info.setblogreadnum (Mr.group (5)); Info.setblogreply (Mr.group (6));        Result.add (info); }        returnresult; }

Others do not repeat, specific can see the source code:getcsdnlistview.zip

This address:http://www.cnblogs.com/yc-755909659/p/4195436.html

PS: This article by y know from original, if reproduced please indicate the source, thank you!

Android my Blog App 1. Crawl Blog Home page article list Content-Web data crawl

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.