1. Install the pre-requisite libraries-requests requests to the Web page
Interpreter's own urllib and re
Selenium used to initiate a request to a Web page with JS rendering
From selenium import Webdriver
Driver = Webdriver. Chrome () # Generates a Driver object and opens Google Chrome
Driver.get (' https://www.baidu.com ') # Open Baidu Web page
Driver.page_source View the source code of the page, you can get the rendered page source code
Selenium need to open web page, inconvenient
From selenium import Webdriver
Driver = Webdriver. PHANTOMJS () # Generate a Driver object
Driver.get (' https://www.baidu.com ') # does not produce any open web pages during the operation
Driver.page_source View the source code of a webpage
2.lxml Library
PIP3 Install lxml
You can also go to the Python official website to download WHL files, download good file links, WHL end of the PIP3 install link directly installed
3.beautifulsoup is also a Web page parsing library
Depends on lxml, that is, to install lxml this library first
PIP3 Install BEAUTIFULSOUP4 indicates the fourth version of the installation BeautifulSoup
>>> from BS4 import beautifulsoup # import BeautifulSoup
>>> soup = BeautifulSoup (' (HTML) (/html) ', ' lxml ')
Why is BS4, because others write the module when the definition of a package is called PS4, show off the module. can go to the official website to view the source code
4.pyquery Analytic Library
PIP3 Install Pyquery
>>> from pyquery import Pyquery as PQ
>>> doc = PQ (' (HTML) Hello (/html) ')
>>> result = doc (' HTML '). Text () to see what the label corresponds to
Summary: The above are some of the analytic libraries, the following describes some repositories
5.pymysql working with MySQL library
PIP3 Install Pymysql
6 Pymongo
PIP3 Install Pymongo # Pymongo is the operation of the MongoDB database
7 Redis distributed crawler crawl queue
PIP3 Install Redis
8 Flask Storage Interface for Web library proxies
PIP3 Install flask
9.django
PIP3 Install Django
10.jupyter
PIP3 Install Jupyter
Jupyter Notebook # is entered directly on the command line. Pop-up a browser that shows the files that were running at that time and can create new files
Can run code online, using the Python interpreter
Crawler from beginner to abort-pure beginner Learning-Crawler Basic Database installation