Recently, using Android and Jsoup to capture novel data, the use of Jsoup can be referred to http://www.open-open.com/jsoup/, while grasping the contents of the catalogue of the Chinese web of eternal life has encountered a problem,
Eternal Book Introduction URL Http://book.zongheng.com/book/48552.html, I'm going to crawl <a Class= "button read" href= "http://book.zongheng.com/showchapter/48552.html" >Click to read</a> The URL of this link again according to this URL to the index page, parse the index page Chapter directory and link. Using Jsoup to fetch class can be called directly
document doc = Jsoup.parse (" http://book.zongheng.com/book/48552.html ");
Doc.select (". Button read"), after attempting to find a space in class can not catch the corresponding link. Baidu searched a circle to find http://hi.baidu.com/chen88358323/item/459090031758c691a3df4389
This solution is not very good. Think of Jsoup and jquery Selec mechanism is similar, and found http://zhidao.baidu.com/question/311666643.html. This article is very enlightening.
The last test found that a class with spaces can be written as two select
Written as Elements indexEs = doc. Select (". Button"). Select (". read"); successfully crawl all directories and links of the book.
#1楼 2013-08-27 17:22 Beyond-bit
Not caught, you used the wrong way:
Use: Elements ele=doc.getelementsbyclass ("Classvalue");
Jsoup processing of data spatiotemporal lattice using style class