BeautifulSoup is good at web data analysis, but Python for Android:beautifulsoup has bugs,
Text = H4.a.text can only get None, so I wrote the Function:gettext () to fix this bug.
Example: Grabbing csdn geek headlines soup.py
Import Urllib2, refrom beautifulsoup import beautifulsoupimport sysreload (SYS) sys.setdefaultencoding (' Utf-8 ') def GetText (text): begin = Text.find (' > ', 0) if begin >-1: begin + = 1 end = Text.find (' </a> ', Begin If begin < end: return Text[begin:end].strip () else: return None else: return Nonepage = Urllib2.urlopen ("http://geek.csdn.net/new") soup = beautifulsoup (page) for H4 in Soup.findall (' H4 '): if H4.A is not None: href = h4.a.get (' href ') text = GetText (str (h4.a)) print text print Hrefpage.close ()
please refer to: Http://www.crummy.com/software/BeautifulSoup/bs3/documentation.zh.html