Python crawler encounters the Pit

Source: Internet
Author: User

1. Environment

-Python
Mac OS Pre-installed Python

 $ python-v python  2.7 . 10  $ where python /usr/bin/ python$  ls /system/library/frameworks/python.framework/versions  2.3  2.5  2.6  2.7   current $  ls /library/frameworks/python.framework/versions (user-installed directory) 

-IDE
Pycharm
-Auxiliary
Install PIP

sudo easy_install pip

-Python Library

sudo Install requests (default installation
sudo Install BeautifulSoup (default installation BeautifulSoup 3.2.1)
sudo Install lxml (default installation lxml 3.7.3)

2. Questions

-Question 1

code:
soup = BeautifulSoup (html, ' lxml ')
error:
Traceback (most recent Call last):
File"/users/cuizhenyu/ documents/codes/python/downloadmeitu/libbeautifulsouptest.py ", line <module>
soup = BeautifulSoup (html) #soup = BeautifulSoup (HTML, ' lxml ') error Br> TypeError: ' Module ' object is not callable
fix:
from BeautifulSoup import BeautifulSoup

-Question 2

Code:
soup = beautifulsoup (html, ' lxml ')
Error:
Traceback (most recent):
File "/users/cuizhenyu/documents/codes/python/downloadmeitu/libbeautifulsouptest.py", line at <module >
soup = beautifulsoup (html, ' lxml ') #soup = BeautifulSoup (HTML, ' lxml ') error
File "/library/python/2.7/site-packages/beautifulsoup.py", line 1522, in __init__
beautifulstonesoup.__init__ (self, *args, **kwargs)
File "/library/python/2.7/site-packages/beautifulsoup.py", line 1147, in __init__
self._feed (ishtml=ishtml)
File "/library/python/2.7/site-packages/beautifulsoup.py", line 1189, in _feed
sgmlparser.feed (self, markup)
File "/system/library/frameworks/python.framework/versions/2.7/lib/python2.7/sgmllib.py", line 104, in feed
self.goahead (0)
File "/system/library/frameworks/python.framework/versions/2.7/lib/python2.7/sgmllib.py", line 138, in Goahead
k = Self.parse_starttag (i)
File "/system/library/frameworks/python.framework/versions/2.7/lib/python2.7/sgmllib.py", line 296, in Parse_ Starttag
Self.finish_starttag (tag, attrs)
File "/system/library/frameworks/python.framework/versions/2.7/lib/python2.7/sgmllib.py", line 338, in Finish_ Starttag
Self.unknown_starttag (tag, attrs)
File "/library/python/2.7/site-packages/beautifulsoup.py", line 1338, in Unknown_starttag
Self.enddata ()
File "/library/python/2.7/site-packages/beautifulsoup.py", line 1251, in Enddata
(not self.parseOnlyThese.text or \
attributeerror: ' str ' object has no attribute ' text '
Solution:
The current BeautifulSoup is V3 version, does not support lxml, etc., need V4 version.

Python crawler encounters the Pit

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.