1. Environment
-Python
Mac OS Pre-installed Python
$ python-v python 2.7 . 10 $ where python /usr/bin/ python$ ls /system/library/frameworks/python.framework/versions 2.3 2.5 2.6 2.7 current $ ls /library/frameworks/python.framework/versions (user-installed directory)
-IDE
Pycharm
-Auxiliary
Install PIP
sudo easy_install pip
-Python Library
sudo Install requests (default installation
sudo Install BeautifulSoup (default installation BeautifulSoup 3.2.1)
sudo Install lxml (default installation lxml 3.7.3)
2. Questions
-Question 1
code:
soup = BeautifulSoup (html, ' lxml ')
error:
Traceback (most recent Call last):
File"/users/cuizhenyu/ documents/codes/python/downloadmeitu/libbeautifulsouptest.py ", line <module>
soup = BeautifulSoup (html) #soup = BeautifulSoup (HTML, ' lxml ') error Br> TypeError: ' Module ' object is not callable
fix:
from BeautifulSoup import BeautifulSoup
-Question 2
Code:
soup = beautifulsoup (html, ' lxml ')
Error:
Traceback (most recent):
File "/users/cuizhenyu/documents/codes/python/downloadmeitu/libbeautifulsouptest.py", line at <module >
soup = beautifulsoup (html, ' lxml ') #soup = BeautifulSoup (HTML, ' lxml ') error
File "/library/python/2.7/site-packages/beautifulsoup.py", line 1522, in __init__
beautifulstonesoup.__init__ (self, *args, **kwargs)
File "/library/python/2.7/site-packages/beautifulsoup.py", line 1147, in __init__
self._feed (ishtml=ishtml)
File "/library/python/2.7/site-packages/beautifulsoup.py", line 1189, in _feed
sgmlparser.feed (self, markup)
File "/system/library/frameworks/python.framework/versions/2.7/lib/python2.7/sgmllib.py", line 104, in feed
self.goahead (0)
File "/system/library/frameworks/python.framework/versions/2.7/lib/python2.7/sgmllib.py", line 138, in Goahead
k = Self.parse_starttag (i)
File "/system/library/frameworks/python.framework/versions/2.7/lib/python2.7/sgmllib.py", line 296, in Parse_ Starttag
Self.finish_starttag (tag, attrs)
File "/system/library/frameworks/python.framework/versions/2.7/lib/python2.7/sgmllib.py", line 338, in Finish_ Starttag
Self.unknown_starttag (tag, attrs)
File "/library/python/2.7/site-packages/beautifulsoup.py", line 1338, in Unknown_starttag
Self.enddata ()
File "/library/python/2.7/site-packages/beautifulsoup.py", line 1251, in Enddata
(not self.parseOnlyThese.text or \
attributeerror: ' str ' object has no attribute ' text '
Solution:
The current BeautifulSoup is V3 version, does not support lxml, etc., need V4 version.
Python crawler encounters the Pit