Python Libraries might not Knowby Greg | January 20, 2015
There is tons of Python packages out there. So many this no one man or woman could possibly catch them all. PyPi alone have over 47,000 packages listed!
Recently, with-many data scientists making the switch to Python, I-couldn ' t help but think this while they ' re getting s Ome of the great benefits of pandas, Scikit-learn, and NumPy, they ' re missing out on some older yet equally helpful Python Libraries.
In this post, I ' m going to highlight some lesser-known libraries. Even experienced pythonistas should take a look, there might be one or both in there you ' ve never seen!
1) DeLorean
Delorean is a really cool Date/time library. Apart from has a sweet name, it's one of the more natural feeling date/time munging libraries I ' ve used in Python. It's sort of like moment
in JavaScript, except I laugh every time I import it. The docs is also good and in addition to being technically helpful, they also do countless back to the future< /em> references.
from delorean import DeloreanEST = "US/Eastern"d = Delorean(timezone=EST)
2) prettytable
There ' s a chance you haven ' t heard prettytable
of because it's listed on Googlecode, which is basically the coding equivalent of Siberia.
Despite being exiled to a cold, snowy and desolate place, was prettytable
great for constructing output that looks good in the Te Rminal or in the browser. So if you ' re working to a new plug-in for the IPython Notebook, check out for prettytable
your HTML __repr__
.
FromPrettytableImport PrettytableTable= Prettytable(["Animal", "Ferocity"])Table.Add_row(["Wolverine", 100])Table.Add_row(["Grizzly", 87])Table.Add_row(["Rabbit of Caerbannog", 110])Table.Add_row(["Cat", -1])Table.Add_row(["Platypus", 23])Table.Add_row(["Dolphin", 63])Table.Add_row(["Albatross", 44])Table.Sort_key("Ferocity")Table.Reversesort= True+----------------------+----------+|Animal|Ferocity|+----------------------+----------+| RabbitOfCaerbannog | 110 ||Wolverine| 100 ||Grizzly| 87 | | Dolphin |63 || Albatross |44 || platypus |23 || cat |-1 |+----------------------+----------+
3) Snowballstemmer
Ok so the first time I installed snowballstemmer
, it is because I thought the name was cool. But it ' s actually a pretty slick little library. Would snowballstemmer
stem words in the different languages and also comes with a porter stemmer to boot.
from snowballstemmer import EnglishStemmer, SpanishStemmerEnglishStemmer().stemWord("Gregory")# GregoriSpanishStemmer().stemWord("amarillo")# amarill
4) wget
Remember every time wrote that web crawler for some specific purpose? Turns out somebody built it...and it ' s called wget
. Recursively download a website? Grab every image from a page? Sidestep cookie traces? Done, do, and done.
Movie Mark Zuckerberg even says it himself
First up are Kirkland, they keep everything open and allow indexes in their Apache configuration, so a little wget magi C is enough to download the entire Kirkland Facebook. Kid stuff!
The Python version comes with just, about every feature, you could ask for and are easy-to-use.
import wgetwget.download("http://www.cnn.com/")# 100% [............................................................................] 280385 / 280385
Note that another option for Linux and OSX users would is to use the Do: from sh import wget
. However the Python wget module does has a better argument handline.
5) PYMC
I ' m not sure how PyMC
gets left scikit-learn
out of the mix so often seems to be everyone's darling (as it should, it ' s fan Tastic), but in my opinion, not enough love was given to PyMC
.
FromPymc.ExamplesImportDisaster_modelFromPymcImportMcmcm= Mcmc (disaster_model ) m. (iter=10000,< Span class= "PLN" > Burn=1000, Thin=10) [----------------- Span class= "lit" >100%-----------------] 10000 10000 complete in Span class= "lit" >1.4 sec
If you don ' t already know it, the IS- PyMC
a library for doing Bayesian analysis. It ' s featured heavily in Cam Davidson-pilon's Bayesian Methods for Hackers and have made cameos on a lot of popular data SC Ience/python blogs, but had never received the cult following akin to scikit-learn
.
6) SH
I can ' t risk you leaving this page and not knowing sh
about. sh
lets you import shell commands into Python As functions. It's super useful for doing things that is easy on bash but what can ' t remember how to does in Python (i.e. recursively sear Ching for files).
FromShImportFindfind("/tmp")/Tmp/foo/tmp/ foo/file1. Json/tmp/foo/ file2./tmp/foo/ file3./tmp/foo/ bar/file3. JSON
7) Fuzzywuzzy
Ranking in the top ten of simplest libraries I ' ve ever used (if you had 2-3 minutes, you can read through the source), is a fuzzy string matching library built by the fine people at SeatGeek.
fuzzywuzzy
Implements things like string comparison ratios, token ratios, and plenty of the other matching metrics. It ' s great for creating feature vectors or matching up records in different databases.
from fuzzywuzzy import fuzzfuzz.ratio("Hit me with your best shot", "Hit me with your pet shark")# 85
8) ProgressBar
You know those scripts are you having where do a in that print "still going..."
giant mess of A for loop your call your __main__
? Yeah well instead of doing so, why does don ' t you step up your game and start using progressbar
?
progressbar
Does pretty much exactly what do you think it does...makes progress bars. And while this isn ' t exactly a data science specific activity, it does put a nice touch on those extra long running script S.
Alas, as another googlecode outcast, it's not getting much love (thedocs has 2 spaces for indents ... 2!!!). Do what's right and give it a good ole pip install
.
FromProgressBarImport ProgressBarImportTimepbar= ProgressBar (maxval=10) for I in Range (1, 11 Pbar. (i) Time.sleep (1) pbar finish () # 60% |######################### ############################### |
9) Colorama
So while you ' re making your logs has nice progress bars, why isn't also make them colorful! It can actually is helpful for reminding yourself when things is going horribly wrong.
colorama
is super easy-to-use. Just pop it into your scripts and add any text you want to print to a color:
) UUID
I ' m of the mind that there is really only a few tools one needs in programming:hashing, Key/value stores, and Universall Y unique IDs. Is the uuid
built in Python UUID library. It implements versions 1, 3, 4, and 5 of the UUID standards and is really handy for doing things like...err...ensuring Uni Queness.
That's might sound silly, but how many times has you had records for a marketing campaign, or an e-mail drop and you want T o Make sure everyone gets their own promo code or ID number?
And if you ' re worried on running out of IDs, then fear not! The number of UUIDs you can generate are comparable to the number of atoms in the universe.
import uuidprint uuid.uuid4()# e7bafa3d-274e-4b0a-b9cc-d898957b4b61
Well if you were a
uuid probably would is.
One) Bashplotlib
Shameless self-promotion, is one of bashplotlib
my creations. It lets you plot histograms and scatterplots using stdin. So while you might don't find it replacing Ggplot or matplotlib as your everyday plotting library, the novelty value is quit E High. At the very least, use it as a-to-spruce up your logs a bit.
$ pip install bashplotlib$ scatter --file data/texas.txt --pch x
Python Libraries might not Know