Installation of Python common libraries
- Urllib, re these two libraries are Python's built-in libraries, directly using the method import imports.
- Requests This library is the requested library. We need to use the Execute file pip3 to install. The file is under C:\Python36\Scripts, we can set this path to an environment variable first. Enter PIP3 install requests on the command line to install. Verify after installation is complete.
>>> Import Requests>>> requests. Get ('http://www.baidu.com')<response []>
- Seleniumis actually used for the browser of a library, do crawlers may encounter the use of JS rendering of the Web page, using requests to request, may not be able to get the content normally, we use selenium can drive the browser to obtain the rendered page. It is also installed using the PIP3 install selenium. For verification.
Import Selenium from Import webdriver>>> driver = webdriver. Chrome () DevTools listening on WS://127.0.0.1:60980/devtools/browser/7c2cf211-1a8e-41ea-8e4a-c97356c98910 >>> driver.get ('http://www.baidu.com')
The above command can open the Chrome browser directly and open Baidu. However, before this we must install a chromedriver, and install Googlchrome browser, you can go to the official website to download. When we install and then run these test code may still appear a flash back, then the problem is that the version of Chrome and Chromdriver is incompatible, you can download chrome higher version on the official website, or chromedriver lower version, But as long as it is the highest version there is no problem.
- PHANTOMJS is an interface-free browser that runs in the background. You can download it on the website yourself. And you need to set the directory where the Phantomjs.exe is located as an environment variable. Test the code.
from Import webdriver>>> driver = webdriver. PHANTOMJS ()>>> driver.get ('http://www.baidu.com')>>> Driver.page_source '
- lxml installation using PIP3 install lxml.
- BeautifulSoup is a network parsing library that relies on the lxml library. Install using PIP3. The PIP3 install BEAUTIFULSOUP4 must be installed because BeautifulSoup has stopped maintenance. Installation verification.
from Import beautifulsoup>>> soup = beautifulsoup ('', ' lxml ' )>>>
- PyqueryIt is also a Web page parsing library, more convenient than BS4, syntax and jquery. It is also installed using PIP3.
>>> from pyquery import Pyquery as PQ # rename it >>> doc = PQ ( Span style= "COLOR: #800000" > " <HTML></HTML> " >>> doc = PQ ( >>> result = doc ( " html " ). Text () >>> result " hello World "
- Pymysqlis a library that operates the MySQL database. Install using PIP3.
Import pymysql>>> conn = pymysql.connect (host='localhost', user=' Root ' ' 123456 ', port=3306,db='mysql')>>> cursor = Conn.cursor ()>>> cursor.execute ('select * from db') 0
- PymongoManipulate the database MongoDB library. Need to open MongoDB service, in the Computer Management of the service to find. It is also installed using PIP3.
>>>ImportPymongo>>> client = Pymongo. Mongoclient ('localhost')>>> db = client['Newtestdb']>>> db['Table'].insert ({'name':'Tom'}) ObjectId ('5b868ee4c4d17a0b2466f748')>>> db['Table'].find_one ({'name':'Tom'}){'_id': ObjectId ('5b868ee4c4d17a0b2466f748'),'name':'Tom'}>>>#completed a single query of the data
- RedisA non-relational database with high operational efficiency. Use the PIP3 install Redis installation.
>>> import Redis >> > r = Redis. Redis ( " localhost ", 6379 >>> r.set (" name ", " tom " Span style= "COLOR: #000000" >) True >>> r.get ( " name " ) b " tom " >> > # is a byte type data type
- flask may be used when acting as a proxy. Install using PIP3. Details can be viewed on the Flask website Flask documentation.
- Django is a Web server framework that provides a complete background management, engine, interface, etc. that you can use to make a full web site. Documents can be viewed on Django's website. Use the PIP3 install Django installation.
- Jupyter can be understood as a notepad, run Hey Web side, can write code, debug, run. In the official website can be downloaded Jupyter, can also be installed with PIP3, the relevant library is very many, installation is relatively long. After installation, you can run Jupyter notebook directly on the command line because this file is in the Scrips directory.
c:\users\dell>is in from local Directory:c:\users\dell
You can create a new Python3 file in option new, and you can write code.
The default file name is Unite, change it here to Testdemo, use the shortcut key CTRL + ENTER run, press B to jump to the new edit line.
Installation of common library of Python crawler and its environment configuration