Python Tools for machine learning

Source: Internet
Author: User
Tags lua theano statsmodels

Python Tools for machine learning

Python is one of the best programming languages out there, with a extensive coverage in scientific Computing:computer VI Sion, artificial intelligence, mathematics, astronomy to name a few. Unsurprisingly, this holds true to machine learning as well.

Of course, it has some disadvantages too; One of which is, the tools and libraries for Python are scattered. If you is a unix-minded person, the This works quite conveniently as every tool does one thing and does it well. However, this also requires-know different libraries and tools, including their advantages and disadvantages, to be Able to make a sound decision for the systems it is building. Tools by themselves does not make a system or product better, but with the right tools we can work much more efficiently and be more productive. Therefore, knowing the right tools for your work domain is crucially important.

This post aims to lists and describe the most useful machine learning tools and libraries that is available for Python. To make this list, we do not require the library to being written in Python; It was sufficient for it to have a Python interface. We also have a small sections on deep learning at the end as it has received a fair amount of attention recently.

We do not aim for list  all  the machine learning libraries available in Python (the Python PAC Kage Index returns 139 results for ' machine learning ') but rather the ones so we found useful and well-maintained to the Best of our knowledge. Moreover, although some of modules could be used to various machine learning tasks, we included libraries whose main FOCU S is machine learning. For example, Although scipy has some clustering algorithms, the main focus of this module was not machine le Arning but rather in being a comprehensive set of tools for scientific computing. Therefore, we excluded libraries like Scipy from our list (though we use it too!).

Another thing worth mentioning is so we also evaluated the library based on what it integrates with other scientific comp Uting Libraries because machine learning (either supervised or unsupervised) are part of a data processing system. If The library that is the using does not fit with your rest of the data processing system and then your may find yourself Spendi Ng a tremendous amount of time to creating intermediate layers between different libraries. It's important to has a great library in your toolset but it's also important for the library to integrate well with O ther libraries.

If you is great in another language but want to use Python packages, we also briefly go into how do you could integrate with Python to use the libraries listed in the post.

Scikit-learn

Scikit Learn is we machine learning tool of the choice at CB Insights. We use it for classification, feature selection, feature extraction and clustering. What's the most about it's it's it has a consistent API which are easy-to-use while also providinga lot of evaluation, Diagnostic and Cross-validation methods out of the box (sound familiar? Python has batteries-included approach as well). The icing on the cake are that it uses SCIPY data structures under the hood and fits quite well with the rest of scientific Computing in Python with Scipy, Numpy, Pandas and matplotlib packages. Therefore, if you want to visualize the performance of your classifiers (say, using a precision-recall graph or Receiver O Perating characteristics (ROC) curve) those could be quickly visualized with help of matplotlib. Considering how much time was spent on cleaning and structuring the data, this makes it very convenient to use the library As it tightly integrates to other scientific computing packages.

Moreover, it has also limited Natural Language processing feature extraction capabilities as well such as bag of words, TF IDF, preprocessing (Stop-words, custom preprocessing, analyzer). Moreover, if you want to quickly perform different benchmarks on toy datasets, it has a datasets module which provides COM Mon and useful datasets. Could also build toy datasets from these datasets for your own purposes to see if your model performs well before APPL Ying the model to the Real-world dataset. For parameter optimization and tuning, it also provides grid search and random search. These features could not being accomplished if it did not has great community support or if it is not well-maintained. We look forward to its first stable release.

Statsmodels

Statsmodels is another great library which focuses on statistical models and are used mainly for predictive and exploratory Analysis. If you want to fit linear models, does statistical analysis, maybe a bit of predictive modeling, then Statsmodels is a great Fit. The statistical tests it provides is quite comprehensive and cover validation tasks for most of the cases. If you is R or S user, it also accepts R syntax for some of its statistical models. It also accepts Numpy arrays as well as Pandas Data-frames for its models making creating intermediate data structures a T Hing of the past!

Pymc

PYMC is the tool of choice for Bayesians. It includes Bayesian models, statistical distributions and diagnostic tools for the convergence of models. It includes some hierarchical models as well. If you want to do Bayesian analysis, you should check it out.

Shogun

Shogun is a machine learning toolbox with a focus on support vectors machines (SVM) that's written in C + +. It is actively developed and maintained, provides a python interface and the Python interface are mostly documented well. However, we ' ve found its API hard-to-use compared to Scikit-learn. Also, it does not provide many diagnostics or evaluation algorithms out of the box. however is a great advantage.

Gensim

Gensim is defined as "topic Modeling for humans". As its homepage describes, it main focus is latent Dirichlet Allocation (LDA) and its variants. Different from other packages, it have support for Natural Language processing which makes it easier to combine NLP Pipelin E with other machine learning algorithms. If your domain is in NLP and you want to does clustering and basic classification, you could want to check it out. Recently, they introduced recurrent neural Network based text representation called Word2vec from Google to their API as W Ell. This library was written purely in Python.

Orange

The only library of Orange is the with a graphical User Interface (GUI) among the libraries listed in this post. It is also quite comprehensive in terms of classification, clustering and feature selection methods and have some cross-val Idation methods. It is better than scikit-learn in some aspects (classification methods, some preprocessing capabilities) as well, but it D OES not fit well with the rest of the scientific computing Ecosystem (Numpy, Scipy, Matplotlib, Pandas) as nicely as Sciki T-learn.

Have a GUI is an important advantage over other libraries however. Could visualize cross-validation results, models and feature selection methods (you need to install Graphviz for some of the capabilities separately). Orange has it own data structures for most of the algorithms so you need to wrap the data into orange-compatible data str Uctures which makes the learning curve steeper.

Pymvpa

PYMVPA is another statistical learning library which are similar to Scikit-learn in terms of its API. It had cross-validation and diagnostic tools as well, but it was not as comprehensive as scikit-learn.

Deep learning

Even though deep learning are a subsection machine learning, we created a separate sections for this field as it has receive D tremendous attention recently with various acqui-hires by Google and Facebook.

Theano

Theano is the most mature of the deep learning Library. IT provides NICE data structures (tensors) to represent layers of neural networks and they is efficient in terms of linea R algebra similar to Numpy arrays. One caution is, it API may not be very intuitive, which increases learning curve for users. There is a lot of libraries which build on top of the Theano exploiting its data structures. It has a support for GPUs programming out of the box as well.

PyLearn2

There is another library built on top of the Theano, called PyLearn2 which brings modularity and configurability to Theano whe Re could create your neural network through different configuration files so that it would is easier to experiment dif Ferent parameters. Arguably, it provides more modularity by separating the parameters and properties of the neural network to the configuration F Ile.

Decaf

DECAF is a recently released deep learning Library from UC Berkeley which have state of art neural network implementations Which is tested on the IMAGENET classification competition.

Nolearn

If you want to use excellent Scikit-learn library APIs in deep learning as well, Nolearn wraps DECAF to make the life Easie R for you. It's a wrapper on top of DECAF and it's compatible (mostly) with Scikit-learn, which makes DECAF even more awesome.

Overfeat

Overfeat is a recent winner of Dogs vs Cats (Kaggle competition) which are written in C + + but it comes with a Python Wrappe R as well (along with Matlab and Lua). It uses GPU through Torch library so it's quite fast. It also won the detection and localization competition in ImageNet classification. If your main domain is in computer vision, your may want-to-check it out.

Hebel

Hebel is another neural network library comes along with GPU support out of the box. You could determine the properties of your neural networks through YAML files (similar to PYLEARN2) which provides a nice W Ay to separate your neural network from the code and quickly run your models. Since It has been recently developed, documentation are lacking in terms of depth and breadth. It's also limited in terms of neural network models as it's only have one type of neural network model (Feed-forward). However, it's written in pure Python and it'll be nice library as it had a lot of utility functions such as schedulers and monitors which we did not see any library provides such functionalities.

Neurolab

Neurolab is another neural network library which have nice API (similar to Matlab's API if you're familiar) It has Differen T variants of recurrent neural Network (RNN) implementation unlike other libraries. If you want to use RNN, this library might is one of the best choice with its simple API.

Integration with other languages

Do not know any Python but great in another language? Do not despair! One of the strengths of Python (among many other) was that it was a perfect glue language that you could use your tool of CH Oice programming language with these libraries through access from Python. Following packages for respective programming languages could is used to combine Python with other programming languages:

    • R-Rpython
    • Matlab-Matpython
    • Jython, Java
    • Lunatic Python, Lua
    • Julia-PYCALL.JL
Inactive Libraries

These is the libraries that does not release any updates for more than one year, we is listing them because some may find It useful, but it's unlikely that these libraries would be maintained for bugs fixes and especially enhancements in the FU Ture

    • Mdp
    • Mlpy
    • Ffnet
    • Pybrain

If we are missing one of your favorite packages in Python for machine learning, feel free to let us know in the comments. We'll gladly add that the library to our blog post as well.

Python Tools for machine learning

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.