Python Library installation Method:
Method One: setpu.py
1. Download the archive package, unzip, record the following path: *:/**/....../
2. Run cmd and switch to the *:/**/....../directory
3. Run setup.py Build
4. Then enter Python and enter the Python module to verify that the installation was successful
Method Two:
1.Win + R Open Run window, enter CMD return
2. Locate the PIP installation path--x:\python xx\scripts
3. Switch to the directory on the command line CD X:\Python xx\scripts
4. Enter PIP Install * * * (library name)
Common Library Daquan:
1.os--functions in the operating system via Python: Create a new folder, specify a path, etc.
2. Crawling Web pages
2.1 urllib--Crawling Web pages
Urllib.request
2.2 bs4--to the page according to the label to extract information (need to download)
2.3 re--(Regular expression Regular expressions)
2.4 Requests Library--a Python third-party library that specializes in handling complex HTTP requests, cookies, headers (response headers and request headers), and so on (https://github.com/kennethreitz/requests/ Tarball/master)
2.5 smtplib--send and receive mail
The 2.6 Selenium Library is an API that is called on Webdriver. Webdriver is a bit like a browser that can load a Web site, but it can also be used as a BeautifulSoup object to find page elements and interact with elements on the page
(Send text, click, etc.), and perform other actions to run a web crawler.
2.7 collections--Data structure
2.8 Import Random
Phantomjs is a "headless" (headless) browser. It loads the site into memory and executes the JavaScript on the page, but it does not show the user the graphical interface of the page. Combine selenium and Phantomjs in a
, you can run a very powerful web crawler that can handle cookies, Javascrip, headers, and anything you need to do.
3. Data storage
3.1 Pymysql--python store data to MySQL database
3.2 xlrd,xlwt--writing data to Excel
4. File read
4.1 pdfminer3k--take PDF file https://pypi.python.org/pypi/pdfminer3k
4.2 Ython-docx Library------ holds the creation of new documents and reads some basic file data, such as file size and file title, does not support body reading.
5. Algorithms
5.1 nltk--Natural Language Processing www.nltk.org statistical analysis and lexical analysis--books: Natural Language processing with Python
5.2 Pillow and tesseract--image recognition and word processing
(http://pillow.readthedocs.org/)
Tesseract is currently recognized as the best and most accurate open source OCR system. Tesseract is a Python command-line tool, not a library imported through an import statement. After installation, use the Tesseract command to run outside of Python. On Windows systems, download a convenient executable installation file (https://code.google.com/p/tesseract-ocr/downloads/list) to install. The biggest disadvantage of tesseract is the processing of the gradient background color.
Create a picture location file: What each character is and where each character is--the online tool tesseract OCR Chopper (http://pp19dd.com/tesseract-ocr-chopper/), because it does not need to be installed, There is no other dependency, as long as the browser can be run, and the use is simple: Upload a picture, if you want to add a new rectangle click on the "Add" button, you can also adjust the size of the rectangle as necessary, and finally copy the newly generated rectangle location file into a new file.
If you are interested in Tesseract's other training methods, even if you are planning to build your own CAPTCHA training library, or want to share your recognition of a new font with tesseract enthusiasts around the world, I recommend that you read the Tesseract documentation carefully (https:// Github.com/tesseract-ocr/tesseract/wiki).
5.3 numpy--because NumPy can be mathematically represented as a huge array of pixels, so it works seamlessly with the tesseract to complete the task.
6.JavaScript Library
6.1 JQuery
6.2 Google Analytics
7.GUI Library
7.1 Tkinter--python3
This article is from "Little public who" blog, please make sure to keep this source http://xiaogongju.blog.51cto.com/12830710/1975872
Python Library installation methods and common libraries