Python Library installation methods and common libraries

Source: Internet
Author: User
Tags tesseract ocr nltk

Python Library installation Method:

Method One: setpu.py

1. Download the archive package, unzip, record the following path: *:/**/....../

2. Run cmd and switch to the *:/**/....../directory

3. Run setup.py Build

4. Then enter Python and enter the Python module to verify that the installation was successful

Method Two:

1.Win + R Open Run window, enter CMD return

2. Locate the PIP installation path--x:\python xx\scripts

3. Switch to the directory on the command line CD X:\Python xx\scripts

4. Enter PIP Install * * * (library name)

Common Library Daquan:

1.os--functions in the operating system via Python: Create a new folder, specify a path, etc.

2. Crawling Web pages

2.1 urllib--Crawling Web pages

Urllib.request

2.2 bs4--to the page according to the label to extract information (need to download)

2.3 re--(Regular expression Regular expressions)

2.4 Requests Library--a Python third-party library that specializes in handling complex HTTP requests, cookies, headers (response headers and request headers), and so on (https://github.com/kennethreitz/requests/ Tarball/master)

2.5 smtplib--send and receive mail

The 2.6 Selenium Library is an API that is called on Webdriver. Webdriver is a bit like a browser that can load a Web site, but it can also be used as a BeautifulSoup object to find page elements and interact with elements on the page

(Send text, click, etc.), and perform other actions to run a web crawler.

2.7 collections--Data structure

2.8 Import Random

Phantomjs is a "headless" (headless) browser. It loads the site into memory and executes the JavaScript on the page, but it does not show the user the graphical interface of the page. Combine selenium and Phantomjs in a

, you can run a very powerful web crawler that can handle cookies, Javascrip, headers, and anything you need to do.

3. Data storage

3.1 Pymysql--python store data to MySQL database

3.2 xlrd,xlwt--writing data to Excel

4. File read

4.1 pdfminer3k--take PDF file https://pypi.python.org/pypi/pdfminer3k

4.2 Ython-docx Library------holds the creation of new documents and reads some basic file data, such as file size and file title, does not support body reading.

5. Algorithms

5.1 nltk--Natural Language Processing www.nltk.org statistical analysis and lexical analysis--books: Natural Language processing with Python

5.2 Pillow and tesseract--image recognition and word processing

(http://pillow.readthedocs.org/)

Tesseract is currently recognized as the best and most accurate open source OCR system. Tesseract is a Python command-line tool, not a library imported through an import statement. After installation, use the Tesseract command to run outside of Python. On Windows systems, download a convenient executable installation file (https://code.google.com/p/tesseract-ocr/downloads/list) to install. The biggest disadvantage of tesseract is the processing of the gradient background color.

Create a picture location file: What each character is and where each character is--the online tool tesseract OCR Chopper (http://pp19dd.com/tesseract-ocr-chopper/), because it does not need to be installed, There is no other dependency, as long as the browser can be run, and the use is simple: Upload a picture, if you want to add a new rectangle click on the "Add" button, you can also adjust the size of the rectangle as necessary, and finally copy the newly generated rectangle location file into a new file.

If you are interested in Tesseract's other training methods, even if you are planning to build your own CAPTCHA training library, or want to share your recognition of a new font with tesseract enthusiasts around the world, I recommend that you read the Tesseract documentation carefully (https:// Github.com/tesseract-ocr/tesseract/wiki).

5.3 numpy--because NumPy can be mathematically represented as a huge array of pixels, so it works seamlessly with the tesseract to complete the task.

6.JavaScript Library

6.1 JQuery

6.2 Google Analytics

7.GUI Library

7.1 Tkinter--python3

Python Library installation methods and common libraries

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.