Python uses BeautifulSoup to implement reptilian _python

Source: Internet
Author: User

I used to talk about using PHANTOMJS as a crawler to catch a Web page http://www.jb51.net/article/55789.htm is made with a selector.

Using the Python module BeautifulSoup (document: http://www.crummy.com/software/BeautifulSoup/bs4/doc/), it's easy to crawl Web content

# coding=utf-8
import urllib from
bs4 import beautifulsoup

url = ' http://www.baidu.com/s '
values ={' WD ': ' Tennis '}
Encoded_param = Urllib.urlencode (values)
full_url = URL + '? ' + Encoded_param
response = Urllib.urlopen (full_url)
soup =beautifulsoup (response)
alinks = Soup.find_all (' a ')

The above can crawl Baidu search out the result is a tennis record.

BeautifulSoup has a number of very useful methods built into it.

Some of the more useful features:

Construct a node element

Copy Code code as follows:

Soup = BeautifulSoup (' <b class= ' boldest ">extremely bold</b> ')
Tag = soup.b
Type (TAG)
# <class ' Bs4.element.Tag ' >

Properties can be obtained using attr, and the result is a dictionary

Copy Code code as follows:

Tag.attrs
# {U ' class ': U ' boldest '}

or directly tag.class to take the attribute.

You can also manipulate properties freely

Tag[' class '] = ' verybold '
tag[' id ' = 1
tag
# <blockquote class= "Verybold" id= "1" >extremely bold< /blockquote>

del tag[' class ']
del tag[' id '
tag
# <blockquote>extremely bold</ Blockquote>

tag[' class ']
# keyerror: ' Class '
print (Tag.get (' class ')
# None

You can also find DOM elements, such as the following example, in a casual operation.

1. Build a document

Html_doc = "" "
soup = BeautifulSoup (Html_doc)

2. Various

Soup.head #  
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.