Long time no crawler, the recent use of Python BeautifulSoup4, scrapy respectively to the previous written spider optimization, found that the python3.5 after these libraries changed a lot, encountered a lot of problems, here to do a summary.
To switch environments:
Since I installed Python2.7 and Python3.5 on Windows, switching the environment on the pycharm, even if the corresponding version number has been toggled in the interpreter of setting, it still needs to restart Pycharm to work.
In addition, if you do not want to complicate, directly change the system variable path.
#如果是python3.5.x, change to
C:\Users\Administrator\AppData\Local\Programs\Python\Python35\Scripts\; C:\Users\Administrator\AppData\Local\Programs\Python\Python35\; C:\Users\Administrator\AppData\Roaming\npm
#如果是python2.7.x, change to D:\Python27\Scripts\;D: \python27\; C:\Users\Administrator\AppData\Roaming\npm
In addition, if the switch is not complete due to the environment variables, using PIP to install a variety of libraries, it is very easy to error, it is recommended to "\lib\site-packages" in the "\beautifulsoup4-4.5.1.dist-info" Information data deleted, Then re-use PIP installation.
Ignore Trust error:
The following code resolves an untrusted SSL certificate issue when you access HTTPS
Import Sslssl._create_default_https_context = Ssl._create_unverified_context
This period of time continue to optimize my crawler, if you encounter other problems and then summarize the update.
Python Installation BeautifulSoup considerations