This article describes how to use BeautifulSoup to compile a crawler program to obtain the title and url of Baidu search results. for details, refer to the Java jsoup package, the BeautifulSoup library of Python should be easy to use.
The code is as follows:
# Coding: UTF-8
Import sys
Import urllib
Import urllib2
From BeautifulSoup import BeautifulSoup
Question_word = "food programmers"
Url = "http://www.baidu.com/s? Wd = "+ urllib. quote (question_word.decode (sys. stdin. encoding). encode ('gbk '))
Htmlpage = urllib2.urlopen (url). read ()
Soup = BeautifulSoup (htmlpage)
Print len (soup. findAll ("table", {"class": "result "}))
For result_table in soup. findAll ("table", {"class": "result "}):
A_click = result_table.find ("")
Print "----- title ---- \ n" + a_click.renderContents () # Title
Print "---- link ---- \ n" + str (a_click.get ("href") # link
Print "---- description ---- \ n" + result_table.find ("p", {"class": "c-abstract"}). renderContents () # Description
Print