28 GitHub's most popular open source machine learning programs

Source: Internet
Author: User
Tags svm jupyter notebook theano mxnet keras

Http://ml.ailab.cn/article-76485.html

Now machine learning has become a hot industry, after more than 20 years of development, machine learning has also been widely used, such as: Data mining, computer vision, natural language processing, biometric identification, search engine, medical diagnosis, DNA sequencing, speech and handwriting recognition, strategy games and robotics.

The cloud community has deliberately translated 28 of the most popular open-source machine learning projects on GitHub for developers ' reference.

1. TensorFlow

TensorFlow is the second-generation machine learning system released by Google. According to Google, in part of the benchmark test, tensorflow processing speed than the first generation of distbelief accelerated twice times more.

Specifically, TensorFlow is an open source software library that uses data Flow Graphs for numerical calculations: the nodes (Nodes) in the diagram represent mathematical operations, while the edges (Edges) in the diagram represent multidimensional arrays that circulate between nodes. That is, tensor (tensors). This flexible architecture allows users to deploy computing on one or more CPUs on desktops, servers, or mobile devices in a variety of applications without rewriting the code At the same time, any gradient-based machine learning algorithm can be used to learn from TensorFlow's automatic differentiation (auto-differentiation), in addition, through the flexible Python interface, it is easier to express ideas in TensorFlow.

TensorFlow was originally developed by the Google Brain Group, a researcher and engineer affiliated to Google's Machine Intelligence Research Institute, for the purpose of research in computer learning and deep neural networks. However, the system is versatile enough to be widely used in other computing fields.

At present, Google has a lot of use of AI technology, including Google App speech recognition, Gmail's Auto-reply function, Google Photos image search, etc. are using TensorFlow.

Development language: C + +

License Agreement: Apache License 2.0

GitHub Project Address: Https://github.com/tensorflow/tensorflow

2. Scikit-learn

Scikit-learn is a Python module for machine learning, built on top of scipy. The project was created by David Cournapeau in 2007, when the project was named Google Summer of Code, and since then many volunteers have contributed to this.

Main Features:

    • Easy-to-operate, efficient data mining and data analysis
    • No access restrictions and can be reused in any situation
    • Based on NumPy, SciPy and Matplotlib

The basic functions of scikit-learn are divided into six parts: classification, regression, clustering, data dimensionality reduction, model selection, data preprocessing, in particular, can refer to the official website of the document. After testing, Scikit-learn can be run on Python 2.6, Python 2.7, and Python 3.5. In addition, it should also run on Python 3.3 and Python 3.4.

Note: Scikit-learn was formerly known as Scikits.learn.

Development language: Python

License Agreement: 3-clause BSD License

GitHub Project Address: Https://github.com/scikit-learn/scikit-learn

3.Caffe

Caffe is a deep learning framework produced by expression, speed, and modularity in neural networks. It was later developed through the Berkeley Center for Visual and Learning (BVLC) and community participants to form a relatively loose and free community of Berkeley-led and then a combination of GitHub and caffe-users mail.

Caffe is a framework based on the C++/cuda architecture that enables developers to use its free organization network, which currently supports convolutional neural networks and fully connected neural networks (artificial neural networks). On Linux, C + + can operate the interface through the command line, for Matlab, Python also has a dedicated interface, the operation of the CPU and GPU support direct seamless switching.

Features of Caffe
    • Ease of Use: Caffe model and corresponding optimization are given in textual form rather than code form, Caffe gives the model definition, optimization settings and pre-training weights, convenient and fast usage;
    • Fast: Ability to run the best models and massive amounts of data;
    • Caffe can be used in conjunction with CUDNN to test the alexnet model, and processing a picture on K40 requires only 1.17ms;
    • Modularity: Easy to extend to new tasks and settings;
    • Users can define their own models through the various layer types provided by Caffe;

At present, Caffe application practice mainly includes data collation, design network structure, training result, based on the existing training model, using Caffe to identify directly.

Development language: C + +

License Agreement: BSD 2-clause License

GitHub Project Address: Https://github.com/BVLC/caffe

4. Predictionio

Predictionio is an open-source machine learning server for developers and data scientists. It supports event acquisition, algorithm scheduling, evaluation, and prediction results queries via rest APIs. Users can make some predictions through predictionio, such as personalized recommendations, discovery content, and so on. Predictionio provides 20 preset algorithms that developers can run directly on their own data. Almost any application with Predictionio integration can become more "smart". The main features are as follows:

    • Predictable user behavior based on existing data;
    • The user can choose your own machine learning algorithm;
    • There's no need to worry about scalability, it's good extensibility.

Predictionio is based on the REST API (application interface) standard, but it also contains SDKs for programming languages such as Ruby, Python, Scala, and Java (software Development Kit). Its development language is the Scala language, the database uses the MONGODB database, the computing system uses the Hadoop system architecture.

Development language: Scala

License Agreement: Apache License 2.0

GitHub Project Address: Https://github.com/PredictionIO/PredictionIO

5. Brain

Brain is a neural network library in JavaScript. The following example illustrates the use of brain to approximate the XOR feature:

var net = new brain. Neuralnetwork (); Net.train ([{input: [0, 0], output: [0]},{input: [0, 1], output: [1]},{input: [1, 0], output: [1]},{input: [1, 1], output: [0]}]; var output = Net.run ([1, 0]);//[0.987]

When brain is used in a node, NPM can be used to install:

NPM Install Brain

When brain is used for the browser, download the latest Brain.js file. Training calculations are expensive, so you should train your network offline (or on a Worker) and use the tofunction () or Tojson () option to insert a pre-trained network into your Web site.

Development language: JavaScript

GitHub Project Address: Https://github.com/harthur/brain

6. Keras

Keras is an extremely compact and highly modular neural network library that works on TensorFlow or Theano and is a highly modular neural network library that supports GPU and CPU operations. Keras can be said to be a python version of the Torch7, for the rapid construction of the CNN model is very convenient, but also contains some of the latest documentation algorithms, such as batch noramlize, documentation tutorial is also very full, on the official web Author is directly to the example easy to understand. Keras also supports the preservation of well-trained parameters, and then loads the already trained parameters for further training.

Keras focuses on the development of rapid experimentation, with the possibility of least delay in achieving a shift from concept to result, which is the key to a good study.

Consider using Keras when you need a library of deep learning as follows:

    • Taking into account the simple and fast prototyping method (through overall modularity, simplification and extensibility);
    • Supports both convolutional networks and recursive networks, as well as the combination of the two;
    • Support any connection scheme (including multi-input and multi-output training);
    • Works seamlessly on both the CPU and GPU.

Keras currently supports Python 2.7-3.5.

Development language: Python

GitHub Project Address: Https://github.com/fchollet/keras

7. CNTK

CNTK (Computational network Toolkit) is a unified deep learning toolkit that describes neural networks as a series of computational steps through a graph of graphs. In a forward graph, a leaf node represents an input value or network parameter, and the other node represents a matrix operation above the input of that node.

CNTK makes it easy to implement and combine popular patterns such as feedforward neural Networks (DNN), convolutional neural Networks (CNN), and recurrent neural networks (RNNS/LSTMS). At the same time, it realizes the random gradient descent (SGD, error reverse propagation) learning across multi-GPU and server auto-differentiation and parallelization.

Compare the processing speed of the CNTK (the number of frames processed per second) with the other four well-known toolkits. The configuration is based on a four-layer fully-connected neural network (see benchmark script) and an efficient mini batch size of 8192. On the basis of the same hardware and the corresponding latest public software versions (versions prior to 2015.12.3), the following results are obtained:

CNTK has been open source since April 2015.

Development language: C + +

GitHub Project Address: Https://github.com/Microsoft/CNTK

8. Convnetjs

Convnetjs is a neural network that uses JavaScript, and also has a very good browser-based demo. Its most important use is to help deep learning beginners to understand algorithms faster and more intuitively.

It currently supports:

    • Common neural network modules (fully connected layer, non-linear);
    • The cost function of classification (SVM/SOFTMAX) and regression (L2);
    • Assign and train the convolution network for image processing;
    • Experimental reinforcement learning model based on deep Q learning.
Some online examples:
    • convolutional neural Network on MNIST digits
    • convolutional Neural Network on CIFAR-10
    • Toy 2D Data
    • Toy 1D Regression
    • Training an autoencoder on MNIST digits
    • Deep Q Learning Reinforcement Learning demo+image Regression ("Painting") +comparison of Sgd/adagrad/adadelta on mnist development language : Javascript License Agreement: MIT License GitHub Project address: Https://github.com/karpathy/convnetjs
9. Pattern

Pattern is a Web mining module for Python. Has the following tools:

    • Data mining: Web Services (Google, Twitter, Wikipedia), web crawler, HTML Dom parsing;
    • Natural language Processing: part-of-speech tagging tools (Part-of-speech tagger), N-Meta search (N-gram searches), sentiment analysis (sentiment analyses), WordNet;
    • Machine learning: Vector space model, clustering, classification (KNN, SVM, Perceptron);
    • Network analysis: Graphical centrality and visualization.

It is well documented and currently has more than 50 cases and more than 350 unit tests. Pattern currently supports only Python 2.5+ (not yet Python 3), which has no external requirements other than using LSA in the Pattern.vector module, so just install NumPy (installed by default on Mac OS x only).

Development language: Python

License Agreement: BSD license

GitHub Project Address: Https://github.com/clips/pattern

Ten. Nupic

Nupic is a machine intelligence platform that implements the HTM learning algorithm. The HTM is a detailed artificial intelligence algorithm for the new (brain) cortex (neocortex). The core of HTM is a time-based continuous learning algorithm, which can store and call both time and space modes. Nupic can be used to solve various problems, especially anomaly detection and streaming data source prediction.

Nupic binaries file is currently available for:

    • Linux x86 64bit
    • OS X 10.9
    • OS X 10.10
    • Windows 64bit

Nupic has its own unique place. Many machine learning algorithms cannot adapt to new patterns, and nupic works close to the human brain, and when patterns change, it forgets old patterns and remembers new patterns.

Development language: Python

GitHub Project Address: Https://github.com/numenta/nupic

Theano.

Theano is a Python library that allows the user to efficiently define, optimize, and evaluate mathematical expressions involving multidimensional arrays while supporting GPUs and efficient symbolic differentiation operations. Theano has the following characteristics:

    • Closely related to NumPy-the use of Numpy.ndarray in the Theano compilation function;
    • Transparently use gpu--to perform data-intensive computations more than 140 times faster than CPUs (for Float32);
    • Efficient symbolic differentiation--theano divides the derivative of a function into one or more different inputs;
    • Optimization of speed and stability-even if the input x is very small, the correct result of log (1+X) can be obtained;
    • Generate c code dynamically--expression calculation is faster;
    • Extensive unit testing and self-validation-Multiple error types are detected and judged.

Since 2007, Theano has been working on large-scale intensive scientific computing research, but it is also widely used in classrooms (such as deep learning/machine learning courses at Montreal University).

Development language: Python

GitHub Project Address: Https://github.com/Theano/Theano

MXNet.

Mxnet is a deep learning framework that combines efficiency and flexibility. It allows the user to combine symbolic programming with imperative programming to maximize efficiency and productivity. Its core is the dynamic dependent scheduler, which can dynamically and automatically parallelize the operation of symbols and commands. The graphical optimization layer in which it is deployed enables faster symbolic operations and higher memory utilization. The library is lightweight and portable and scales to multiple GPUs and multiple hosts.

Main Features:

    • Its design notes provide useful insights that can be re-applied to other DL projects;
    • Flexible configuration of arbitrary calculation diagram;
    • Integrates the advantages of various programming approaches to maximize flexibility and efficiency;
    • Lightweight, efficient memory and support for portable smart devices;
    • Multi-GPU extension and distributed auto-parallelization settings;
    • Supports Python, R, C + +, and Julia;
    • Friendly to cloud computing, directly compatible with S3, HDFs, and Azure.

Mxnet is not just a deep learning project, it's a blueprint for building deep learning systems, guidelines and a combination of hackers ' unique insights into deep learning systems.

Development language: Jupyter Notebook

Open Source License: Apache-2.0license

GitHub Project Address: https://github.com/dmlc/mxnet

Vowpal Wabbit

Vowpal Wabbit is a machine learning system that drives the development of advanced machine learning technologies such as online, hash, allreduce, Learning2search, and so on. The training is very fast, in 2 billion training samples, each training sample is about 100 non-0 features: If the total number of features is 10,000, the training time is 20 minutes, the total number of features is 10 million, the training time is 2 hours. Vowpal Wabbit supports classification, regression, matrix decomposition, and LDA.

When running Vowpal wabbit on Hadoop, there are the following optimization mechanisms:

    • Lazy initialization: All data can be loaded into memory and cached before all reduce. Even if a node has an error, you can continue training by using the data of the wrong node (obtained through caching) on another node.
    • Speculative execution: In a large-scale cluster, one or two slow mapper can affect the performance of the job as a whole. The idea of speculative execution is that when the tasks of most nodes are complete, Hadoop can copy the tasks on the remaining nodes to the other nodes to complete.

Development language: C + +

GitHub Project Address: Https://github.com/JohnLangford/vowpal_wabbit

Ruby Warrior

By designing a game that makes the Ruby language and AI learning more enjoyable and interactive.

The user played a warrior by climbing up a tall tower and reaching the top floor to get the precious Ruby (Ruby). At each level, you need to write a Ruby script to guide the warrior to defeat the enemy, rescue the captives, and reach the stairs. The user has some knowledge of each layer, but you never know what happens on each level. You must give the warrior enough artificial intelligence so that he can find a way to deal with it on its own.

Warrior's action-related API:

    • Warrior.walk: Used to control the movement of warriors, the default direction is forward;

    • Warrior.feel: Use Warriors to perceive the situation ahead, such as a space, or a monster;

    • Warrior.attack: Let the warrior attack the monster;

    • Warrior.health: Gets the current health value of the warrior;

    • Warrior.rest: Let the Warrior rest a round and restore the maximum health of 10%.

Warrior's Perception API:
    • Space.empty: Perceive whether the front is a space;

    • Space.stairs: Perceiving whether the front is a staircase;

    • Space.enemy: Perceiving whether there are monsters in front of them;

    • Space.captive: Perceiving whether there are captives in front of them;

    • Space.wall: Perceive whether the front is a wall.

Development language: Ruby

GitHub Project Address: Https://github.com/ryanb/ruby-warrior

The above is the most popular open source machine learning Project on GitHub, TOP14, "28 GitHub's most popular open source machine learning Project (ii)".

Compiled from: https://github.com/showcases/machine-learning

Translator: Liu Chongxin proofreading: Wang Jianjin

28 GitHub's most popular open source machine learning programs

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.