These two days need to do something, need to crawl other people's web page some information. Finally, use Htmlparser to parse HTML.
Read it directly from the code:
First you need to note that the import package is: The package under import Org.htmlparser
Copy Code code as follows:
list<mp3> mp3list = new arraylist<mp3> ();
try{
Parser Parser = new Parser (HTMLSTR);//Initialize Parser, note that the import package is org.htmlparser. There are a lot of parameters here. The place I write is the HTML text that gets good in advance. You can also pass in the URL object
Parser.setencoding ("Utf-8");/Set Encoder
Andfilter filter =
New Andfilter (
New Tagnamefilter ("div"),
New Hasattributefilter ("id", "songlistwrapper")
//The DIV is found through the filter and the div ID is songlistwrapper
NodeList nodes = Parser.parse (filter);//Get nodes by filter
Node node = nodes.elementat (0);
NodeList Nodeschild = Node.getchildren ();
node[] Nodesarr = Nodeschild.tonodearray ();
NodeList nodesChild2 = Nodesarr[1].getchildren ();
node[] nodesArr2 = Nodeschild2.tonodearray ();
Node Nodeul = nodesarr2[1];
node[] Nodesli = Nodeul.getchildren (). Tonodearray ();//parse out NODESLI for desired
for (int i=2;i<nodesli.length;i++) {
System.out.println (nodesli[i].tohtml ());
Node tempnode = Nodesli[i];
Tagnode Tagnode = new Tagnode ()//property is obtained by Tagnode, only node is converted to Tagnode to get the properties of a label
Tagnode.settext (tempnode.tohtml ());
String clastr = Tagnode.getattribute ("class");//clastr is bb-dotimg clearfix song-item-hook {' Songitem ': {' sid ': ' 1132758 ", ' sname ': ' My request is not high ', ' Author ': ' Yellow Bohai '}}
Clastr = Clastr.replaceall ("", "");
if (Clastr.indexof ("\?") ==-1) {
Pattern pattern = Pattern.compile ("[\\s\\wa-z\\-]+\\{' Songitem ': \\{' sid ': ' ([\\d]+) ', ' sname ': ' ([\\s\\s]*) ', '"] Author ': ' ([\\s\\s]*) ' \\}\\} ');
Matcher Matcher = Pattern.matcher (CLASTR);
if (Matcher.find ()) {
MP3 mp3 = new Mp3 ();
Mp3.setsid (Matcher.group (1));
Mp3.setsname (Matcher.group (2));
Mp3.setauthor (Matcher.group (3));
Mp3list.add (mp3);
for (int j=1;j<=matcher.groupcount (); j + +) {
System.out.print ("" "+j+"---> "+matcher.group (j));
//}
}
}
System.out.println (Matcher.find ());
}
}catch (Exception e) {
E.printstacktrace ();
}
The above is what I parse in the project, using or relatively simple, easy to use.
Clastr for bb-dotimg clearfix song-item-hook {' Songitem ': {' sid ': ' 113275822 ', ' sname ': ' My request is not high ', ' author ': ' Huang Bo
Is the content that is parsed from the Web page.