Seven development libraries that Python developers should know

Source: Internet
Author: User

During my years of Python programming experience and the exploration and roaming process on Github, I found some very good Python Development Kits which greatly simplified the development process, this article is to recommend these sdks to you.

Note that I have excluded libraries such as SQLAlchemy and Flask, because they are too good to mention.

Start as follows:

1. PyQuery (with lxml)

Installation Method pip install pyquery

Beauul ul Soup is the most frequently recommended method for parsing HTML in Python, and it does do well. It provides a good Python-style API and is easy to find relevant documents online. However, when you need to parse a large number of documents in a short time, you will encounter performance problems, which is simple, but it's really slow.

It is a performance comparison chart of the year 08:

We found that the performance of lxml is so good, but there are very few documents, and it is very clumsy to use! Choose a database that is easy to use but slow, or a database that is fast but complex to use?

Who said they must choose one? What we need is a convenient and fast XML/HTML parsing library!

PyQuery can meet the demanding requirements of ease of use and resolution speed.

Let's look at the following lines of code:

 
 
  1. from pyquery import PyQuery  
  2. page = PyQuery(some_html)  
  3.  
  4. last_red_anchor = page('#container > a.red:last') 

It's easy, like jQuery, But it's Python.

However, there are also some shortcomings. You need to re-encapsulate the text when using iterations:

 
 
  1. for paragraph in page('#container > p'):  
  2.     paragraph = PyQuery(paragraph)  
  3.     text = paragraph.text() 

2. dateutil

Installation Method: pip install dateutil

Processing date is very painful, thanks to dateutil

 
 
  1. from dateutil.parser import parse  
  2.  
  3. >>> parse('Mon, 11 Jul 2011 10:01:56 +0200 (CEST)')  
  4. datetime.datetime(2011, 7, 11, 10, 1, 56, tzinfo=tzlocal())  
  5.  
  6. # fuzzy ignores unknown tokens  
  7.  
  8. >>> s = """Today is 25 of September of 2003, exactly  
  9. ...        at 10:49:41 with timezone -03:00.""" 
  10. >>> parse(s, fuzzy=True)  
  11. datetime.datetime(2003, 9, 25, 10, 49, 41,  
  12.                   tzinfo=tzoffset(None, -10800)) 

3. fuzzywu.pdf

Installation Method: pip install fuzzywu.pdf

Fuzzywuzzy allows you to perform fuzzy Comparison on two strings. This is useful when you need to process human-generated data. The following code uses the Levenshtein distance comparison method to match the user input array and possible options.

 
 
  1. from Levenshtein import distance  
  2.  
  3. countries = ['Canada', 'Antarctica', 'Togo', ...]  
  4.  
  5. def choose_least_distant(element, choices):  
  6.     'Return the one element of choices that is most similar to element' 
  7.     return min(choices, key=lambda s: distance(element, s))  
  8.  
  9. user_input = 'canaderp' 
  10. choose_least_distant(user_input, countries)  
  11. >>> 'Canada' 

This is good, but it can be better:

 
 
  1. from fuzzywuzzy import process  
  2.  
  3. process.extractOne("canaderp", countries)  
  4. >>> ("Canada", 97) 

4. watchdog

Installation Method: pip install watchdog

Watchdog is a Python API and shell utility used to monitor file system events.

5. sh

Installation Method: pip install sh

Sh allows you to call any program, like a function:

 
 
  1. from sh import git, ls, wc  
  2.  
  3. # checkout master branch  
  4. git(checkout="master")  
  5.  
  6. # print(the contents of this directory  
  7. print(ls("-l"))  
  8.  
  9. # get the longest line of this file  
  10. longest_line = wc(__file__, "-L") 

6. pattern

Installation Method: pip install pattern

Pattern is a Python Web data mining module. It can be used for data mining, natural language processing, machine learning, and network analysis.

7. path. py

Installation Method: pip install path. py

When I started learning Python, OS. path was part of my favorite stdlib. Although it is easy to create a group of files in a directory.

 
 
  1. import os  
  2.  
  3. some_dir = '/some_dir' 
  4. files = []  
  5.  
  6. for f in os.listdir(some_dir):  
  7.     files.append(os.path.joinpath(some_dir, f)) 

However, listdir is in OS rather than OS. path.

With path. py, processing the file path becomes simple:

 
 
  1. from path import path  
  2.  
  3. some_dir = path('/some_dir')  
  4.  
  5. files = some_dir.files() 

Other usage:

 
 
  1. >>> path('/').owner  
  2. 'root' 
  3.  
  4. >>> path('a/b/c').splitall()  
  5. [path(''), 'a', 'b', 'c']  
  6.  
  7. # overriding __div__  
  8. >>> path('a') / 'b' / 'c' 
  9. path('a/b/c')  
  10.  
  11. >>> path('ab/c').relpathto('ab/d/f')  
  12. path('../d/f') 

Is it much better?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.