Python Library Encyclopedia

Source: Internet
Author: User
Tags xml parser nltk web2py


    • Urllib-Network library (STDLIB).

    • Requests-Network library.

    • grab– Network library (based on Pycurl).

    • pycurl– Network Library (binding Libcurl).

    • Urllib3–python HTTP library, secure connection pool, support file post, high availability.

    • httplib2– Network Library.

    • robobrowser– a simple, Python-style Python library that allows you to browse the Web without a separate browser.

    • Mechanicalsoup-a Python library that automatically interacts with Web sites.

    • Mechanize-Stateful, programmable web browsing library.

    • socket– the underlying network interface (STDLIB).

    • Unirest for Python–unirest is a set of lightweight HTTP libraries that can be used in multiple languages.

    • Hyper–python the HTTP/2 client.

    • PYSOCKS–SOCKSIPY updates and actively maintains the version, including bug fixes and some other features. As a direct replacement of the socket module.

Web crawler Framework

    • Full-Featured Crawler

      • grab– Web crawler Framework (based on Pycurl/multicur).

      • scrapy– Web crawler framework (based on twisted), Python3 is not supported.

      • pyspider– a powerful reptile system.

      • cola– a distributed crawler framework.

    • Other

      • Portia– is based on scrapy visual crawler.

      • The HTTP Resource kit for Restkit–python. It allows you to easily access HTTP resources and create objects around it.

      • Demiurge– is based on the Pyquery crawler micro-frame.

Html/xml Parser
    • General

      • Lxml–c language to write efficient html/xml processing library. XPath is supported.

      • cssselect– Parse dom tree and CSS selector.

      • pyquery– parse the DOM tree and jquery selector.

      • beautifulsoup– inefficient html/xml processing library, pure Python implementation.

      • html5lib– generates the DOM of the Html/xml document according to the WHATWG specification. This specification is used in all current browsers.

      • feedparser– parsing Rss/atom feeds.

      • Markupsafe– provides a secure escape string for xml/html/xhtml.

      • xmltodict– A Python module that allows you to feel like you are working with JSON when working with XML.

      • xhtml2pdf– convert Html/css to PDF.

      • The untangle– easily transforms an XML file into a Python object.

    • Clean

      • bleach– Clean up HTML (requires html5lib).

      • Sanitize– brings clarity to the chaotic world of data.

Text Processing

A library for parsing and manipulating simple text.

    • General

    • difflib– (Python standard library) helps with differentiated comparisons.

    • levenshtein– quickly calculates Levenshtein distance and string similarity.

    • fuzzywuzzy– fuzzy string Matching.

    • esmre– the regular expression accelerator.

    • ftfy– automatically organizes Unicode text to reduce fragmentation.

Natural language Processing

A library for dealing with human language problems.

    • NLTK-the best platform for writing Python programs to handle human language data.

    • Pattern–python's network mining module. He has natural language processing tools, machine learning and others.

    • Textblob– provides a consistent API for in-depth natural language processing tasks. Developed on the shoulders of NLTK and pattern giants.

    • jieba– Chinese word breaker tool.

    • snownlp– Chinese Text Processing library.

    • loso– another Chinese word thesaurus.

Browser automation and Simulation

    • selenium– Automation Real Browser (Chrome browser, Mozilla Firefox, Opera browser, ie browser).

    •– PYQT WebKit package (requires PYQT).

    • spynner– PYQT WebKit package (requires PYQT).

    • splinter– Generic API Browser emulator (Selenium Web driver, Django Client, Zope).


    • Threading–python the standard library thread. Works well for I/O intensive tasks. The task for CPU binding is useless because of the Python GIL.

    • The multiprocessing– standard Python library runs multiple processes.

    • celery– Asynchronous task queue/job queue based on distributed message delivery.

    • The Concurrent-futures–concurrent-futures module provides a high-level interface for invoking asynchronous execution.


Asynchronous Network Programming Library

    • asyncio– (Python standard library above Python 3.4 + version) asynchronous I/O, Time loops, co-programs and tasks.

    • twisted– an event-driven network engine framework.

    • tornado– a network framework and an asynchronous network library.

    • Pulsar–python Event-driven concurrency framework.

    • Diesel–python Green-Event-based I/O framework.

    • gevent– a Greenlet-based Python network library that uses the.

    • Eventlet– has an asynchronous framework supported by WSGI.

    • tomorrow– the wonderful modifier syntax for asynchronous code.


    • celery– Asynchronous task queue/job queue based on distributed message delivery.

    • huey– Small multithreaded task queue.

    • MRQ–MR. queue– uses the Python distributed task queue for Redis & Gevent.

    • rq– a lightweight, Redis-based task Queue Manager.

    • Simpleq– is a simple, infinitely extensible, Amazon SQS-based queue.

    • Python-gearman–gearman's Python API.

Cloud computing

    • Execute Python code picloud– the cloud.

    •– cloud execution R,python and MATLAB code

Page Content Extraction

A library that extracts the contents of a Web page.

    • Text and metadata for HTML pages

    • newspaper– uses Python for news extraction, article extraction, and content curatorial.

    • html2text– HTML to markdown formatted text.

    • python-goose–html content/Article extractor.

    • lassie– user-friendly web content retrieval Tool


The library used for WebSocket.

    • crossbar– Open-Source application Messaging routers (Python-implemented WebSocket and Wamp for Autobahn).

    • autobahnpython– provides Python implementations of the WebSocket protocol and WAMP protocol and open source.

    • Websocket-for-python–python 2 and 3 as well as PyPy's WebSocket client and server libraries.

DNS resolution

    • dnsyo– checks your DNS on more than 1500 DNS servers worldwide.

    • The Pycares–c-ares interface. C-ares is the C language library for DNS request and asynchronous name resolution.

Computer Vision

    • opencv– Open source Computer Vision Library.

    • simplecv– is an introduction to camera, image processing, feature extraction, format conversion, and a readable interface (based on OPENCV).

    • The mahotas– fast computer image processing algorithm (implemented entirely using C + +) is completely based on the NumPy array as its data type.

Some frameworks for web development


Django is an open-source Web application framework, written in Python, that supports many database engines, makes web development fast and extensible, and keeps the version updated to match the latest version of Python, starting with this framework if you're a novice programmer.


Flask is a lightweight web application framework that is written using Python. Based on the Werkzeugwsgi Toolbox and the Jinja2 template engine. Use BSD licensing.

Flask is also known as "microframework" because it uses a simple core and adds other features with extension. Flask does not have the default database, form validation tools used. However, Flask retains the elasticity of amplification, which can be added with flask-extension: ORM, form validation tools, file uploads, various open authentication technologies.


Web2py is a free open source web framework written in Python that is designed to be agile and fast to develop Web applications with fast, scalable, secure, and portable database-driven applications that follow the LGPLv3 Open source protocol.

WEB2PY provides a one-stop solution, the entire development process can be done in the browser, providing web version of online development, HTML template authoring, static file upload, database writing functions. Other features include logging and an automated admin interface.


Tornado is a Web server (not detailed in this article), At the same time is a class micro-framework, as the framework of the idea of tornado mainly from the, everyone in the website home can also see Tornado of the Big guy Bret Taylor's passage (which he says here FriendFeed with Tornado can be seen as a thing):

"[ inspired the] web framework we use for FriendFeed [and] the WebApp framework that ships with APP Engine ..."


CherryPy is a simple and useful web framework for Python, whose main purpose is to connect Web servers to Python code in as few operations as possible, including built-in analytics, a flexible plug-in system, and the ability to run multiple HTTP servers at once. Can be run on the latest version of Python, Jython, Android.

Python Library Encyclopedia

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.