In today's era, open source is the core of the rapid development of innovation and technology. This article comes from Kdnuggets's annual inventory, which introduces 2016 of the top 20 Python machine learning open source projects, and will also do some interesting analysis and talk about their development trends. Like last year, Kdnuggets introduced the newest and top 20 Python machine learning Open source project on GitHub. Surprisingly, some of the most active projects have stalled last year, some have fallen out of the top 20 (in terms of contribution and commit), and of course 13 new projects have entered the top 20.
2016 Top 20 Python machine learning Open Source project
1.scikit-learn is a tool for data mining and data analysis based on NumPy, SciPy and Matplotlib, which is not only simple and efficient, but also open source, available to everyone, and has a commercially available BSD license. Can be used very well in different environments.
Submitted by: 21486, Contributors: 736
Links: http://scikit-learn.org/
2.TensorFlow was originally developed by researchers and engineers at the Google Brain team at Google's Machine Intelligence Research Institute. The system is designed to facilitate the study of machine learning, while also making machine learning research prototypes transition to production systems more efficient and easy.
Submitted by: 10466, Contributors: 493
Links: https://www.tensorflow.org/
3.Theano enables you to more efficiently define, optimize, and evaluate mathematical expressions that involve multidimensional arrays.
Submitted by: 24108, Contributors: 263
Links: http://deeplearning.net/software/theano/
The 4.Caffe is a deep learning framework developed by the Berkeley Center for Visual and Learning (BVLC) and community contributors, which combines expressiveness and speed with the advantages of modularity.
Submitted by: 3801, Contributors: 215
Links: http://caffe.berkeleyvision.org/
5.Gensim is a free Python library with features such as extensible statistical semantics that can be used to analyze the semantic structure of plain text documents and retrieve semantically similar documents.
Submitted by: 2702, Contributors: 145
Links: https://radimrehurek.com/gensim/
6.pylearn2 is a machine learning library. Most of its functions are built on the basis of Theano. This means that you can use mathematical expressions to write Pylearn2 plug-ins (new models, algorithms, etc.), and then Theano will optimize these expressions for you to make them more stable, and will compile it to fit the appropriate backend (CPU or GPU) according to your choice.
Submitted by: 7100, contributors: 115
Links: http://github.com/lisa-lab/pylearn2
7.Statsmodels is a Python module that allows users to mine data, estimate statistical models, and perform statistical tests. A detailed list of descriptive statistics, statistical tests, plotting functions, and result statistics can be used for different types of data and estimators.
Submitted by: 8664, Contributors: 108
Links: https://github.com/statsmodels/statsmodels/
8.Shogun is a machine learning toolkit that provides a large number of efficient and unified machine learning (ML) methods. It can easily represent a variety of data, algorithm classes and common tools closely linked.
Submitted by: 15172, Contributors: 105
Links: Https://github.com/shogun-toolbox/shogun
9.Chainer is a Python-based and independent deep learning model open source framework. Chainer provides a flexible, intuitive, and efficient way to implement the entire deep learning model, including the most advanced models such as recurrent neural networks and variational automatic encoders.
Submitted by: 6298, Contributors: 84
Links: Https://github.com/pfnet/chainer
10.NuPIC is an open source project based on a new cortical theory called layered instant memory (htm/hierarchical temporal memories). Part of the HTM theory has been implemented, tested, and used in the application, while the rest is still under development.
Submitted by: 6088, Contributors: 76
Links: Http://github.com/numenta/nupic
11.Neon is a Python-based deep learning library for Nervana (http://nervanasys.com/). It provides ease of use while also providing the highest performance.
Submitted by: 875, Contributors: 47
Links: http://neon.nervanasys.com/
The 12.Nilearn is a Python module for fast and easy statistical learning on neuroimaging data. It uses the Scikit-learn Python Toolbox to handle multivariate statistics such as predictive modeling, classification, decoding, or connection analysis.
Submitted by: 5254, Contributors: 46
Links: Http://github.com/nilearn/nilearn
13.orange3 is an open source machine learning and data visualization tool that both novice and expert can use. Have a large toolbox in an interactive data analysis workflow.
Submitted by: 6356, Contributors: 40
Links: Https://github.com/biolab/orange3
14.PYMC is a Python module that implements the Bayesian statistical model and the fitting algorithm, including Markov chains and Monte Carlo methods. Its flexibility and scalability make it suitable for a wide range of issues.
Submitted by: 2701, Contributors: 37
Links: HTTPS://GITHUB.COM/PYMC-DEVS/PYMC
15.PyBrain is a modular machine learning library for Python. Its goal is to provide a flexible and easy-to-use but still powerful algorithm for machine learning tasks, as well as a variety of predefined environments to test and compare your algorithms.
Submitted by: 984, Contributors: 31
Links: Http://github.com/pybrain/pybrain
16.Fuel is a data pipeline framework that provides the data you need for your machine learning model. It will be used by Blocks and Pylearn2 neural network libraries.
Submitted by: 1053, Contributors: 29
Links: Http://github.com/mila-udem/fuel
17.PyMVPA is a statistical learning analysis Python package for simplifying large datasets. It provides an extensible framework with a large number of advanced interfaces for algorithms such as classification, regression, feature selection, data import, and export.
Submitted by: 9258, Contributors: 26
Links: Https://github.com/PyMVPA/PyMVPA
18.Annoy(approximate Nearest neighbors Oh Yeah) is a Python-bound C + + library that searches for points that are closer to a given query point in space. It also creates a data structure based on large, read-only files that are mapped into memory so that many processes can share the same data.
Submitted by: 365, Contributors: 24
Links: Https://github.com/spotify/annoy
19.Deap is a novel evolutionary computing framework for rapid prototyping and testing ideas. It tries to make the algorithm more understandable and the data structure more transparent. It is perfectly coordinated with parallel mechanisms such as multiprocessing and SCOOP.
Submitted by: 1854, Contributors: 21
Links: Https://github.com/deap/deap
12.Pattern is a Web mining module for the Python programming language. It is bundled with data mining (Google + Twitter + Wikipedia API, web crawler, HTML DOM parser), Natural language processing (pos tagging, n-gram Search, sentiment analysis, WordNet), machine learning (vector space model, K-means clustering, Naive Bayesian + k-nn + SVM classifier) and network analysis (graphical centrality and visualization) tools.
Submitted by: 943, Contributors: 20
Links: Https://pypi.python.org/pypi/Pattern
We can tell from the chart below that PYMVPA has the highest contribution rate compared to other projects. Surprisingly, compared to other projects, although Scikit-learn has the most contributors, it has a lower contribution rate. The reason behind this may be because PYMVPA is a new project that went through the early stages of development, due to new ideas/feature development, bug fixes, refactoring and other reasons that led to many submissions. While Scikit-learn is an early and relatively stable project, it has fewer submissions such as improvements or bug fixes.
We compared the 2015 and 2016 projects, which are the top 20 projects. We can see no significant change in the contribution rate of Pattern, Pybrain and Pylearn2, and no new contributors. In addition, we can see a significant correlation between the number of contributors and the number of submissions. The increase in contributors can lead to an increase in submissions, which I think is a magical place for open source projects and communities; it can lead to brainstorming, generating new ideas and creating better software tools.
This is the Kdnuggets team's analysis of the 2016 Top 20 Python machine learning open source project based on the number of contributors and the number of submissions.
Open source and knowledge sharing is a happy thing!
Original link: http://www.kdnuggets.com/2016/11/top-20-python-machine-learning-open-source-updated.html
2016 GitHub Top 20 python machine learning Open source project (GO)