Plan to do a blog app in the blog, first to be able to access the home page to get data to get home page of the article list, the first step to crawl the blog home page article list the function has been realized, on the millet 2S is as follows:
The idea is: through the writing of the tool class to access the Web page, get the page source code, through the regular expression to get matching data for processing display to the ListView
Briefly explain the following points:
1. Use the Apache HttpClient Library to implement get requests.
2. Asynchronous request processing.
3. Regular expressions fetch the data they need.
Use the Apache HttpClient Library to implement get requests.
Three simple steps to using Apache
New Defaulthttpclient (); // Create a httpclient new HttpGet ("http://///Create a GET request // send a GET request and respond to content
Asynchronous request Processing
The implementation of the asynchronous request is also simple, opening a new thread to perform the request processing, and the request completes the data obtained by handler in the main thread processing. See the Mainactivity.java Class code specifically.
Regular expressions grab the data they need
Visit my Blog home page to view the source code, it is easy to find the format we want to crawl the content of the article list is as follows:
<Divclass= "Posttitle"> <aID= "Homepage1_homepagedays_dayslist_ctl00_daylist_titleurl_0"class= "PostTitle2"href= "http://www.cnblogs.com/yc-755909659/p/4187155.html">"The zero start of Android game Programming" 19. Basics of game development (game music and sound effects)</a> </Div> <Divclass= "Postcon"><Divclass= "C_b_p_desc">Absrtact: In a game, in addition to the gorgeous interface UI directly to attract players, but also important is the game's background music and sound effects, the right background music and wonderful sound matching will make the whole game up a notch. In Android. The class commonly used to play the game's background music is MediaPlayer, while the Soundpool class is used for game sound. 1. MediaPlayer ...<ahref= "http://www.cnblogs.com/yc-755909659/p/4187155.html"class= "C_b_p_desc_readmore">Read the full text</a></Div></Div> <Divclass= "Clear"></Div> <Divclass= "Postdesc">Posted @ 2014-12-30 12:16 y know from Read (45) Comments (0)<ahref= "http://i.cnblogs.com/EditPosts.aspx?postid=4187155"rel= "nofollow">Edit</a></Div> <Divclass= "Clear"></Div>
So, get the regular expression as follows:
"Class=\" posttitle2\ "href=\" (. *?) \ "> (. *?) </a>.* Summary: (. *?) <a.*?posted @ (. *?) Y know from Read (. *?) Comments (. *?) <a ";
And then get the matching data through regular expressions, get the data you need.
/*** Online access to data * *@returnData*/ Public StaticList<bloglistinfo>getblognetdate (string path, string regex) {List<BlogListInfo> result =NewArraylist<bloglistinfo>(); String blogstring=Removern (Http_get (path)); Pattern P=pattern.compile (regex); //the source code string for my blog home pageMatcher m =P.matcher (blogstring); while(M.find ()) {//loop Find matching stringMatchresult Mr =M.tomatchresult (); Bloglistinfo Info=NewBloglistinfo (); Info.setblogurl (Mr.group (1)); Info.setblogtitle (Mr.group (2)); Info.setblogsummary (Mr.group (3)); Info.setblogtime (Mr.group (4)); Info.setblogreadnum (Mr.group (5)); Info.setblogreply (Mr.group (6)); Result.add (info); } returnresult; }
Others do not repeat, specific can see the source code:getcsdnlistview.zip
This address:http://www.cnblogs.com/yc-755909659/p/4195436.html
PS: This article by y know from original, if reproduced please indicate the source, thank you!
Android my Blog App 1. Crawl Blog Home page article list Content-Web data crawl