Network relatedUniversal Urllib-Network library (standard library) requests-Network library grab-network library (PYCURL) Pycurl-Network library (with Libcurl binding) URLLIB3-with thread-safe connection pool, file Psot support, high-availability python HTTP Library httplib2-network library Robobrowser-a simple, pythonic library that can access a Web page without a standalone browser mechanicalsoup-a Python library that completes automated site interactions mechanize-stateful, programmable web Page Browsing Library. Socket-Underlying network interface (standard library) Unirest for Python-a set of lightweight HTTP libraries that support multiple languages Hyper-python HTTP/2 client pysocks-socksipy continuously updated and maintained versions, indicating bug fixes and Some other functionality that can be used as a substitute for the socket module asynchronous TREQ-HTTP client/server based on twisted, API Aiohttp-asyncio similar to requests (PEP-3156)
Web crawler FrameworkUniversal Crawler Grab-Web crawler framework (based on Pycurl/multicurl) Scrapy-web crawler framework (based on twisted) Pyspider-A powerful reptile system cola-a distributed crawler framework other Portia-based on SCR APY's Visual crawler Restkit-python the HTTP repository. Allow shadow Tigers to simply access HTTP resources and use them to create a project Demiurge-a miniature crawler framework based on Pyquery
Html/xml ResolutionUniversal lxml-efficient html/xml processing library. Supports XPath, written in C language Cssselect-parse DOM tree and CSS selector pyquery-parse DOM tree and jquery selector Beautifulsoup-python write inefficient html/xml processing library Html5lib-based on WH The ATWG specification generates the DOM of the Html/xml document. The WHATWG specification is now a browser's current norm Feedparser-parsing rss/atom information flow Markupsafe-python xml/html/xhtml Security Escape string Tool Xmltodict- Let you deal with XML as you do with JSON Xhtml2pdf-html/css to PDF Converter Untangle-translating XML documents into Python projects to simplify processing Hodor-supporting configuration-driven packaging tools for lxml and Cssselect Clean up bleach-clear HTML (requirement html5lib) sanitize-Restore the messy data world
Text Processing
Parse and manipulate text library General difflib -Differential Computing tool (Python standard library) levenshtein -Fast Computing edit distance and string similarity fuzzywuzzy -fuzzy string matching esmre& Nbsp;-the regular expression accelerator. ftfy -reduce the fragmentation conversion of Unicode text unidecode -Unicode into ASCII text character encoding uniout -output the transfer string as readable chardet - Python 2/3 compatible character encoding detector xpinyin -the library pangu.py -CJK and alphanumeric text spacing format slug awesome-slugify - Python slugify library that preserves Unicode python-slugify -speak Unicode to ASCII Python slugify library unicode-slugify -Unicode Slugs generation tool pytils -a gadget that handles Russian strings (contains pytils.translit.slugify) generic parser ply -Python Lex and YACC parsing tools pyparsing - Common frame names for generating parsers python-nameparser -name resolution component number phonenumbers -process, format, store, verify global Phone number user agent string python-user-agents -Browser User Agent parser HTTP Agent parser -python http proxy parser fake-useragent -python user agent spoofing based on global browser statistics user_agent& nbsp;-User agent Data Generator Special format processing
The library of the
Processing special character format for tablib -handles XLS, CSV, JSON, Yaml, and other forms of tabular data textract -extracts text from any document, supports Word, PowerPoint, PDF, etc. messy tables -Messy Tabular Data parsing rows -supports versatile and aesthetically pleasing form data processors in multiple formats (existing CSV, HTML, XLS, TXT--will support more) Office python-docx -reading, querying and modify Microsoft Word 2007/2008 docx file xlwt / xlrd -read and write data and format information from Excel xlsxwriter -for wearing Excel. The Python module of the xlsx file xlwings -a BSD-licensed library that Excel and Python call each other simpler openpyxl -can read, edit Excel 2010xlsx/xlsm/xltx/ Library of Xltm files marmir -extract python data structure and convert it into a library of tables PDF pdfminer -A tool to extract information from a PDF document pypdf2 -a library that splits, merges, and converts PDF files