Transferred from Infoq, author Zhang Tianrei
Machine learning is a hot topic in the field of data analysis, which often uses a variety of machine learning algorithms in peacetime learning and life. In fact, many of the machine learning algorithms based on Python, Java, etc. have been implemented many times by predecessors. These algorithms can be found on the internet a lot, but often there are many "dirty" or "messy" open source code.
Against this backdrop, InfoWorld recently unveiled 11 of the most popular open source projects in machine learning, most of which are related to spam filtering, face recognition, and recommendation engines in 11 open source projects. Most of them are based on today's most popular languages and platforms, promoting and expanding many of the key algorithms in machine learning. From this, users can find not only the model of LDA, but also the HMM and other hidden Markov models. These models are hot spots in the field of application and are the most needed by researchers.
- Scikit-learn
Scikit-learn is a very powerful Python machine learning toolkit. It provides a convenient mathematical tool by building numpy and matplotlib on the basis of existing python. The toolkit includes a number of simple and efficient tools that are ideal for data mining and data analysis.
In the home page, you can see the user Guide, which is the entire machine learning index, where users can learn a variety of effective methods. In reference, users can find specific usage indexes for each class.
- Shogun
Shogun is the oldest open source library for machine learning based on C + +, which was created in 1999. As a swig library, Shogun can easily be embedded in Java, Python, C # and other mainstream processing languages. Its focus is on large-scale kernel methods, especially the "Support Vector Machine" learning toolkit. Among them, it includes a large number of linear methods, such as LDA, LPM, hmm and so on.
- Accord framework/aforge.net
Accord is an extension of aforge.net and is a. NET-based machine learning and signal processing framework. It includes a series of machine learning algorithms for image and audio, such as face detection, sift stitching, and so on. At the same time, accord supports real-time tracking of mobile objects. It provides a machine learning library from a neural network to a decision tree system.
- Mahout
Mahout is a well-known open source project, an open source project by Apache Software, which provides a number of implementations of the machine learning Classic algorithms designed to help developers create intelligent applications more quickly and easily. Mahout contains a number of classical algorithms such as clustering, classification, recommendation, and provides a convenient interface for cloud services.
- MLlib
Mllib is Apache's own spark and Hadoop machine Learning Library, designed for large-scale, high-speed execution of most common machine learning algorithms included in Mllib. Mllib is a Java-based project that can be easily interfaced to Python and other languages. Users can design their own code for Mllib, which is a very personalized design.
- H2o
H2O is the flagship product of 0xdata and is a core data analysis platform. Part of it is written in the R language, and the other part is written in the Java and Python languages. The user can deploy the H2O R program installation package, which can then be run in the R language environment. H2P's algorithm is a live trend-oriented business fraud forecast, and is currently in the new round of financing.
- Cloudera Oryx
Oryx is also a machine learning open source project designed by Hadoop, provided by creators of Cloudera Hadoop distribution. Oryx enables machine-learning models to be used in real-time data streams, such as spam filtering.
- Golearn
Golearn is the integrated machine learning Library of the Go language built by Google, with the goal of being simple and customizable. The go language is Google's flagship language and is now used more and more widely. The simplicity of Golearn is that data is loaded and processed in the library, so it can be customized to extend the data structure to source code.
- Weka
>weka is an open source project for user data mining using Java development. Weka, as an open data mining work platform, has assembled a large number of machine learning algorithms that can assume data mining characters, including preprocessing, classifying, regression, clustering and so on. At the same time, Weka realizes the visualization of big data, and realizes the interaction between human and program through the new interface of Java design.
- Cuda-convnet
Cuda is our well-known GPU accelerator kit. The cuda-convnet is a GPU-accelerated neural Network application machine learning Library. It is written in C + + and uses Nvidia's CUDA GPU processing technology.
Currently, the project has been reorganized into Cuda-convnet2, supporting multiple GPUs and Kepler-generation GPUs. The Vuples project is similar in that it is written in the F # language and applies to. NET platform.
- Convnetjs
Convnetjs is a JavaScript-based online deep learning library that provides on-line deep learning training. It can help deep learning beginners to understand the algorithm faster and more intuitively, through some simple demo to the user's most intuitive explanation.
11 Open source projects for machine learning