SOME Useful machine learning LIBRARIES.

Last Update:2015-12-07 Source: Internet

Author: User

Tags svm theano statsmodels

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

from:http://www.erogol.com/broad-view-machine-learning-libraries/

Http://www.slideshare.net/VincenzoLomonaco/deep-learning-libraries-and-rst-experiments-with-theano

February 6, EREN 1 COMMENT

Especially, with the advent of many different and intricate machine learning algorithms, it's very hard-to-come up with Y Our code to any problem. Therefore, the use of a library and its choice are imperative provision before you start the project. However, there is many different libraries have different quirks and rigs in different languages, even in multiple Lang Uages so, choice is not very straight forward as it seems.

Before you start, I strongly recommend I-experiment the library of your interest so as not to say "Ohh buda!" at the End. For being a simple guide, I'll point some possible libraries and signify some of them as my choices with the reason Behi nd.

My Simple bundle for small projects----

I basically use Python for my problems. Here is my frequently used libraries.

Scikit-learn-very Broad and well established library. It has different functionalities this meet your requisites at your work flow. If you don't need some peculiar algorithms, Scikit-learn is just enough for all. It is predicated with Numpy and Scipy at Python. It also proposes very easy-to-paralleling your code with very easy-to-do.
pandas -other than being a machine learning library Pandas are a "Data analysis library". It gives very handy features to has some observations on data, just before your design your work flow. It support in memory and storage functions. Hence, It is especially useful, if your data are up-to-some large scales that's not easy-to-be-handled via simple methods Or cannot is fit into memory as a whole.
Theano-it is yet another Python library but It's a Nonesuch library. Simply, it interfaces your Python code to low-level languages. As you type in Python Numpy, it converts your code to prescribe low level counterparts and then compile them At the level. It gives very significant performance gains, particularly for large matrix operations. It is also able to utilize from GPU after simple configuration of the library without any further code change. One caveat is, it's not an easy-to-debug because of that compilation layer.
Nltk-it is a natural language processing tool with very unique and salient features. It also includes some basic classifiers like Naive Bayes. If your work was about text processing This is the right tool to process data.

Other Libraries – (this list is being constantly updated.) Deep learning Libraries

PYLEARN2-"A machine learning the Library". It is widely used especially among deep learning researches. It also includes some other features like latent Dirichlet Allocation based on Gibbs sampling.
Theanets (new)-This is yet another neural Networks the library based on Theano. It is very-simple-to-use and I think one's the best library for quick prototyping new ideas.
Hebel-another Young Alternative for deep learning implementation. "Hebel is a library for deep learning with neural networks in Python using GPUs acceleration with CUDA through Pycuda."
Caffe-a convolutional Neural Network library for large scale tackles. It differs by has its own implemntation of the CNN in the low level C + + instead of well-known ImageNet implementation of Alex K Rizhevsky. It assets faster alternative to Alex ' s code. It also provides MATLAB and Python interfaces.
Cxxnet-very similar to Caffe. It supports MULTI-GPU training as well. I ' ve not used it extensively but it seems promising after my small experiments with MNIST dataset. It also servers very modular and easy development interface for new ideas. It has Python and Matlab interfaces as well.
Mxnet-this is a library from the same developers of Cxxnet. It has additional features after the experience gathered from Cxxnet and other backhand libraries. Different than Cxxnet, it has a good interface with Python which provides exclusive development features for deep learning And even general purpose algorithms requiring GPU parallelism.
Pybrain-"Pybrain is short for python-based reinforcement learning, Artificial Intelligence and Neural Network Library."
Brainstorm-python based, GPU possible deep learning library released by Idsia Lab. It's at ery early stage of development but it's still eye catching. At least for now, it targets recurrent networks and 2D convolution layers.

Linear Model and SVM Libraries

Liblinear-a Library for Large Linear classification. It is also interfaced by Scikit-learn.
Libsvm-state of Art SVM library with kernel support. It has also third-party plug-ins, if its built-in capabilities is not enough for you.
Vowpal Wabbit-i Hear the name very often but haven ' t use it by now. However, it seems a decent library for fast machine learning.

General Purpose Libraries

Shougun-general Usage ML Library, similar to Scikit-learn. IT supports for different programming languages.
Mlpack-"A scalable C + + machine learning Library".
Orange-one Another general use ML library. "Open source data visualization and analysis for novice and experts". It has self-organizing (I am studying on) maps implementation that diverse it from others.
MILK-"SVMs (based on LIBSVM), k-nn, random forests, decision trees. It also performs feature selection. These classifiers can be joined on many ways to form different classification systems. "
Weka-weka is a very command tool, learning with GUI support. If you don't want to code, you can cull the data to Weka and select your algorithm from Drop-menu, set the parameters and Go. Moreover, you can call it functions from your Java code. It supports some other languages as well.
Knime-albeit I am not very fan of those kind of tools, Knime is another example of the GUI based framework. You just define your work-flow by creating a visual work-flow. Carry some process boxes to workspace, connect them as you want, set parameters and run.
Rapid-miner-yer another GUI based tool. It is very similar to knime and out of my practice, it has wider capabilities suited different domain of expertise.

Others

Montepython-monte (python) is a Python framework for building gradient based learning machines, like neural networks, CO nditional random fields, logistic regression, etc. Monte contains modules (that hold parameters, a cost-function and a gradient-function) and Trai Ners (that can adapt a module's parameters by minimizing it cost-function on training data).
Modular Toolkit for Data processing-from The user's perspective, MDP is a collection of supervised and unsupervised Lear Ning algorithms and other data processing units, can be combined to data processing sequences and more complex feed- Forward network architectures.
Statsmodels is another great library which focuses on statistical models and are used mainly for predictive and exploratory Analysis. If you want to fit linear models, does statistical analysis, maybe a bit of predictive modeling, then Statsmodels is a great Fit.
PYMVPA is another statistical learning library which are similar to Scikit-learn in terms of its API. It had cross-validation and diagnostic tools as well, but it was not as comprehensive as scikit-learn.
PYMC is the tool of choice for Bayesians. It includes Bayesian models, statistical distributions and diagnostic tools for the convergence of models. It includes some hierarchical models as well. If you want to do Bayesian analysis, you should check it out.
Gensim is topic modelling tool, which is centered on latent Dirichlet Allocation model. It also serves some degree of NLP functionalities.
Pattern-pattern is a Web mining module for Python
Mirado-is Data visualization tool for complicated datasets supporting MAC and Win
Xgboost (new)-If you like Gradient boosting models and you like to o it faster and stronger, it's very useful library With C + + backend and Python, R wrappers. I should say that it's far faster than Sklearn ' s implementation

My Computation Stack---

After the libraries, I feel the need of saying something about the computation environment that I use.

Numpy, Scipy, Ipython, Ipython-notebook, spyder-after waste some time with Matlab, I discovered those tools that Empowe R Scientific Computing with sufficient results. Numpy and Scipy are the very well-known scientific computing libraries. Ipython is a alternative to native Python interpreter with very useful features. Ipython-notebook is a very peculiar editor this is able to run on Web-browser so it's good especially if you are working On a remote machine. Spyder is a Python IDE and it has very useful capabilities this makes your experience very similar to Matlab. Last BU isn't least, all of them is very free. I really suggest to look at those items before your select a framework for your scientific effort.

At the end, for being self promoting I list my own ML codes----

Klp_kmeans-this is a very fast clustering procedure underpinned by Kohonen ' s learning procedure. It includes the alternative with the basic Numpy and faster at large data Theano implementations.
Random forests-it is a Matlab code based on C + + Back-end.
Dominant Set clustering-a Matlab code implementing very fast graph based clustering formulated by Replicator Dynamics O Ptimization.

SOME Useful machine learning LIBRARIES.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More