Python uses beautifulSoup to implement crawlers and python Crawlers
Previously said the use of phantomjs crawling web page http://www.bkjia.com/article/55789.htm is done with Selector
Using the beautifulSoup (Document: http://www.crummy.com/software/BeautifulSoup/bs4/doc/) python module, you can easily capture Web Content
# Coding = utf-8import urllibfrom bs4 import BeautifulSoupurl = 'HTTP: // done = {'wd ': 'tennis'} encoded_param = urllib. urlencode (values) full_url = url + '? '+ Encoded_paramresponse = urllib. urlopen (full_url) soup = BeautifulSoup (response) alinks = soup. find_all ('A ')
The above results can be captured by Baidu and found as a tennis record.
BeautifulSoup has many built-in useful methods.
Several useful features:
Construct a node Element
Copy codeThe Code is as follows:
Soup = BeautifulSoup ('<B class = "boldest"> Extremely bold </B> ')
Tag = soup. B
Type (tag)
# <Class 'bs4. element. tag'>
Attributes can be obtained using attr and the result is a dictionary.
Copy codeThe Code is as follows:
Tag. attrs
# {U'class': u'boldest '}
Or you can directly retrieve the attributes of tag. class.
You can also operate attributes freely.
tag['class'] = 'verybold'tag['id'] = 1tag# <blockquote class="verybold" id="1">Extremely bold</blockquote>del tag['class']del tag['id']tag# <blockquote>Extremely bold</blockquote>tag['class']# KeyError: 'class'print(tag.get('class'))# None
You can also search for dom elements as needed, for example, the following example:
1. Build a document
html_doc = """
2. Various operations
soup.head#
Python uses BeautifulSoup to parse html Problems
Use these two parameters: findAll ('div ', {'class': 'content '})
Usage of BeautifulSoup in Python
Yes
Soup = BeautifulSoup (obtained data, fromEncoding = 'gbk ')
Items = soup. findAll (attrs = {'class': 'small '})
For item in items:
Print item