Installation of common library of Python crawler and its environment configuration

Source: Internet
Author: User
Tags install django jupyter jupyter notebook install redis

Installation of Python common libraries
  • Urllib, re these two libraries are Python's built-in libraries, directly using the method import imports.
  • Requests This library is the requested library. We need to use the Execute file pip3 to install. The file is under C:\Python36\Scripts, we can set this path to an environment variable first. Enter PIP3 install requests on the command line to install. Verify after installation is complete.
    >>> Import Requests>>> requests. Get ('http://www.baidu.com')<response []>
  • Seleniumis actually used for the browser of a library, do crawlers may encounter the use of JS rendering of the Web page, using requests to request, may not be able to get the content normally, we use selenium can drive the browser to obtain the rendered page. It is also installed using the PIP3 install selenium. For verification.
    Import Selenium  from Import webdriver>>> driver = webdriver. Chrome () DevTools listening on WS://127.0.0.1:60980/devtools/browser/7c2cf211-1a8e-41ea-8e4a-c97356c98910 >>> driver.get ('http://www.baidu.com')

    The above command can open the Chrome browser directly and open Baidu. However, before this we must install a chromedriver, and install Googlchrome browser, you can go to the official website to download. When we install and then run these test code may still appear a flash back, then the problem is that the version of Chrome and Chromdriver is incompatible, you can download chrome higher version on the official website, or chromedriver lower version, But as long as it is the highest version there is no problem.

  • PHANTOMJS is an interface-free browser that runs in the background. You can download it on the website yourself. And you need to set the directory where the Phantomjs.exe is located as an environment variable. Test the code.
     from Import webdriver>>> driver = webdriver. PHANTOMJS ()>>> driver.get ('http://www.baidu.com')>>> Driver.page_source '
  • lxml installation using PIP3 install lxml.
  • BeautifulSoup is a network parsing library that relies on the lxml library. Install using PIP3. The PIP3 install BEAUTIFULSOUP4 must be installed because BeautifulSoup has stopped maintenance. Installation verification.
     from Import beautifulsoup>>> soup = beautifulsoup ('', ' lxml ' )>>>

  • PyqueryIt is also a Web page parsing library, more convenient than BS4, syntax and jquery. It is also installed using PIP3.
     >>> from  pyquery import  Pyquery as PQ #   rename it  >>> doc = PQ ( Span style= "COLOR: #800000" > " <HTML></HTML>  "   >>> doc = PQ (    >>> result = doc ( " html   " ). Text ()  >>> result   " hello World  "  
  • Pymysqlis a library that operates the MySQL database. Install using PIP3.
    Import pymysql>>> conn = pymysql.connect (host='localhost', user=' Root ' ' 123456 ', port=3306,db='mysql')>>> cursor =  Conn.cursor ()>>> cursor.execute ('select * from db') 0

  • PymongoManipulate the database MongoDB library. Need to open MongoDB service, in the Computer Management of the service to find. It is also installed using PIP3.
    >>>ImportPymongo>>> client = Pymongo. Mongoclient ('localhost')>>> db = client['Newtestdb']>>> db['Table'].insert ({'name':'Tom'}) ObjectId ('5b868ee4c4d17a0b2466f748')>>> db['Table'].find_one ({'name':'Tom'}){'_id': ObjectId ('5b868ee4c4d17a0b2466f748'),'name':'Tom'}>>>#completed a single query of the data
  • RedisA non-relational database with high operational efficiency. Use the PIP3 install Redis installation.
     >>> import   Redis  >> > r = Redis. Redis ( " localhost   ", 6379 >>> r.set ("  name   ",  " tom  "  Span style= "COLOR: #000000" >) True  >>> r.get ( " name  "  ) b  "  tom   " >> > #   is a byte type data type  
  • flask may be used when acting as a proxy. Install using PIP3. Details can be viewed on the Flask website Flask documentation.
  • Django is a Web server framework that provides a complete background management, engine, interface, etc. that you can use to make a full web site. Documents can be viewed on Django's website. Use the PIP3 install Django installation.
  • Jupyter can be understood as a notepad, run Hey Web side, can write code, debug, run. In the official website can be downloaded Jupyter, can also be installed with PIP3, the relevant library is very many, installation is relatively long. After installation, you can run Jupyter notebook directly on the command line because this file is in the Scrips directory.
     c:\users\dell>is in from local Directory:c:\users\dell

    You can create a new Python3 file in option new, and you can write code.

    The default file name is Unite, change it here to Testdemo, use the shortcut key CTRL + ENTER run, press B to jump to the new edit line.

Installation of common library of Python crawler and its environment configuration

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.