During my years of Python programming experience and the exploration and roaming process on Github, I found some very good Python Development Kits which greatly simplified the development process, this article is to recommend these sdks to you.
Note that I have excluded libraries such as SQLAlchemy and Flask, because they are too good to mention.
Start as follows:
1. PyQuery (with lxml)
Installation Method pip install pyquery
Beauul ul Soup is the most frequently recommended method for parsing HTML in Python, and it does do well. It provides a good Python-style API and is easy to find relevant documents online. However, when you need to parse a large number of documents in a short time, you will encounter performance problems, which is simple, but it's really slow.
It is a performance comparison chart of the year 08:
We found that the performance of lxml is so good, but there are very few documents, and it is very clumsy to use! Choose a database that is easy to use but slow, or a database that is fast but complex to use?
Who said they must choose one? What we need is a convenient and fast XML/HTML parsing library!
PyQuery can meet the demanding requirements of ease of use and resolution speed.
Let's look at the following lines of code:
- from pyquery import PyQuery
- page = PyQuery(some_html)
-
- last_red_anchor = page('#container > a.red:last')
It's easy, like jQuery, But it's Python.
However, there are also some shortcomings. You need to re-encapsulate the text when using iterations:
- for paragraph in page('#container > p'):
- paragraph = PyQuery(paragraph)
- text = paragraph.text()
2. dateutil
Installation Method: pip install dateutil
Processing date is very painful, thanks to dateutil
- from dateutil.parser import parse
-
- >>> parse('Mon, 11 Jul 2011 10:01:56 +0200 (CEST)')
- datetime.datetime(2011, 7, 11, 10, 1, 56, tzinfo=tzlocal())
-
- # fuzzy ignores unknown tokens
-
- >>> s = """Today is 25 of September of 2003, exactly
- ... at 10:49:41 with timezone -03:00."""
- >>> parse(s, fuzzy=True)
- datetime.datetime(2003, 9, 25, 10, 49, 41,
- tzinfo=tzoffset(None, -10800))
3. fuzzywu.pdf
Installation Method: pip install fuzzywu.pdf
Fuzzywuzzy allows you to perform fuzzy Comparison on two strings. This is useful when you need to process human-generated data. The following code uses the Levenshtein distance comparison method to match the user input array and possible options.
- from Levenshtein import distance
-
- countries = ['Canada', 'Antarctica', 'Togo', ...]
-
- def choose_least_distant(element, choices):
- 'Return the one element of choices that is most similar to element'
- return min(choices, key=lambda s: distance(element, s))
-
- user_input = 'canaderp'
- choose_least_distant(user_input, countries)
- >>> 'Canada'
This is good, but it can be better:
- from fuzzywuzzy import process
-
- process.extractOne("canaderp", countries)
- >>> ("Canada", 97)
4. watchdog
Installation Method: pip install watchdog
Watchdog is a Python API and shell utility used to monitor file system events.
5. sh
Installation Method: pip install sh
Sh allows you to call any program, like a function:
- from sh import git, ls, wc
-
- # checkout master branch
- git(checkout="master")
-
- # print(the contents of this directory
- print(ls("-l"))
-
- # get the longest line of this file
- longest_line = wc(__file__, "-L")
6. pattern
Installation Method: pip install pattern
Pattern is a Python Web data mining module. It can be used for data mining, natural language processing, machine learning, and network analysis.
7. path. py
Installation Method: pip install path. py
When I started learning Python, OS. path was part of my favorite stdlib. Although it is easy to create a group of files in a directory.
- import os
-
- some_dir = '/some_dir'
- files = []
-
- for f in os.listdir(some_dir):
- files.append(os.path.joinpath(some_dir, f))
However, listdir is in OS rather than OS. path.
With path. py, processing the file path becomes simple:
- from path import path
-
- some_dir = path('/some_dir')
-
- files = some_dir.files()
Other usage:
- >>> path('/').owner
- 'root'
-
- >>> path('a/b/c').splitall()
- [path(''), 'a', 'b', 'c']
-
- # overriding __div__
- >>> path('a') / 'b' / 'c'
- path('a/b/c')
-
- >>> path('ab/c').relpathto('ab/d/f')
- path('../d/f')
Is it much better?