Crawler from beginner to abort-pure beginner Learning-Crawler Basic Database installation

Source: Internet
Author: User

1. Install the pre-requisite libraries-requests requests to the Web page

Interpreter's own urllib and re

Selenium used to initiate a request to a Web page with JS rendering

From selenium import Webdriver

Driver = Webdriver. Chrome () # Generates a Driver object and opens Google Chrome

Driver.get (' https://www.baidu.com ') # Open Baidu Web page

Driver.page_source View the source code of the page, you can get the rendered page source code

Selenium need to open web page, inconvenient

From selenium import Webdriver

Driver = Webdriver. PHANTOMJS () # Generate a Driver object

Driver.get (' https://www.baidu.com ') # does not produce any open web pages during the operation

Driver.page_source View the source code of a webpage

2.lxml Library

PIP3 Install lxml

You can also go to the Python official website to download WHL files, download good file links, WHL end of the PIP3 install link directly installed

3.beautifulsoup is also a Web page parsing library

Depends on lxml, that is, to install lxml this library first

PIP3 Install BEAUTIFULSOUP4 indicates the fourth version of the installation BeautifulSoup

>>> from BS4 import beautifulsoup # import BeautifulSoup
>>> soup = BeautifulSoup (' (HTML) (/html) ', ' lxml ')

Why is BS4, because others write the module when the definition of a package is called PS4, show off the module. can go to the official website to view the source code

4.pyquery Analytic Library

PIP3 Install Pyquery

>>> from pyquery import Pyquery as PQ
>>> doc = PQ (' (HTML) Hello (/html) ')
>>> result = doc (' HTML '). Text () to see what the label corresponds to

Summary: The above are some of the analytic libraries, the following describes some repositories

5.pymysql working with MySQL library

PIP3 Install Pymysql

6 Pymongo

PIP3 Install Pymongo # Pymongo is the operation of the MongoDB database

7 Redis distributed crawler crawl queue

PIP3 Install Redis

8 Flask Storage Interface for Web library proxies

PIP3 Install flask

9.django

PIP3 Install Django

10.jupyter

PIP3 Install Jupyter

Jupyter Notebook # is entered directly on the command line. Pop-up a browser that shows the files that were running at that time and can create new files

Can run code online, using the Python interpreter

Crawler from beginner to abort-pure beginner Learning-Crawler Basic Database installation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.