How do I choose an open-source machine learning framework?

Source: Internet
Author: User
Tags theano keras

Although Machine Learning is still in the early stage of development, but its integration into the application of the relevant industries, the prospect of immeasurable, and its potential value is doomed machine learning will become the main application of the enterprise. This article and everyone to share is for different industries, how we should choose the right open source framework, a look at it, hope to help you.

Why choose a machine learning framework?

the benefits of using open source tools are more than just their usability. Typically, this level of project has a large number of data engineers and data scientists willing to share datasets and pre-training models. For example, you can use a classification model to train data from ImageNet , rather than using scratch to create a graphical perception. Open-source machine learning tools also allow you to migrate and learn, which means you can solve machine learning problems by using other knowledge. So, you can create a model that can learn to recognize a car or help us accomplish other tasks.

Depending on the problem you need to deal with, pre-trained models and open datasets may not be as accurate as they are customized, but the open source framework does not require you to collect datasets, which can save you a lot of time and effort. using open source models and datasets will be the second leading driver of business success after supervised learning, according to former Baidu chief data scientist and Stanford University professor Andrew Ng.

Among the many active but less popular open source tools, we will choose 5 in- depth discussions to help you find the right tool for you and start your data science exploration path. Next, let's get to the chase.

1.TensorFlow

TensorFlow was originally intended for internal use only by Google, andwas launched in the Apache 2.0 source code. Google 's reputation and model-building process has attracted a large group of TensorFlow advocates.

TensorFlow is a good python tool for deep neural network research and complex mathematical calculations , and it can even support intensive learning. TensorFlow 's uniqueness also lies in the Data flow diagram-structure, which contains nodes (mathematical operations) and edges (numerical arrays or tensor).

1.1 Datasets and models

The flexibility of TensorFlow is reflected in the possibility of research based on it or the repetition of machine learning tasks. Therefore, you can use a low-level API called TensorFlow Core. It allows you to control the model and train them using your own data set. But there are also public pre-trained models for building more advanced APIs on top of TensorFlow Core. The most popular patterns you can use now are MNIST, a traditional dataset that helps to identify handwritten numbers on a picture, or Medicare data, one from Google A data set that helps predict medical service charges.

1.2 Audience and learning curve

for those who first explored machine learning, the diversification of TensorFlow functions will be somewhat complicated. Some people even think that the library does not accelerate the learning curve of machine learning, but rather makes it steeper. TensorFlow is a lower-level library, but it needs to write a lot of code and a good understanding of the details of the data science to better use it for project development. So if your data science team is it -centric, it might not be your best choice, and we'll talk about simpler alternatives.

1.3 use Case

considering the TensorFlow 's complexity, its use cases mainly include solutions for large companies that have experts in the field of machine learning. For example, the UK online supermarket Ocado uses TensorFlow to prioritize their contact centres and improve demand forecasts. At the same time, AXA, the global insurer, uses the library to predict the major car accidents that their users will likely involve.

2.Theano : Mature library with extended performance

Theano is a low-level library that is based on the Python language and is used for scientific computing, and it typically defines, optimizes, evaluates mathematical expressions as the goal of deep learning. Although it has very good computational performance, its complexity is not enough for many users. For this reason,Theano is primarily used in the lower-level wrappers such as Keras, lasagne, and Blocks, which are three high-level frameworks designed for rapid prototyping and model testing.

2.1 data sets and models

Theano has a public model, but the high-usage framework has a large number of tutorials and training data sets to choose from. For example,Keras stores the available models and detailed usage tutorials in its documentation.

2.2 Audience and learning curve

If you use Lasagne or Keras as the top premium wrapper, you will have a large number of tutorials and pre-trained data sets. In addition,Keras is considered to be the easiest library to start from the early deep learning exploration phase.

because TensorFlow was designed to replace Theano, and to lose a lot of fans. However, many data scientists have found that there are many advantages enough for them to use outdated versions.

2.3 use case

taking into account the industrial standards of deep learning research and development, Theano was originally used to supplement the most advanced deep learning algorithms. However, given that you may not be using Theano directly, you can use many of its features as a basis for using other libraries such as digital and image recognition, localizing objects, and even chatting with robots.

3.Torch : Facebook -supported framework driven by Lua scripting language

Torch is often referred to as the simplest deep learning tool for beginners. Because it's a simple scripting language,Lua developed it. Although fewer people use this language than Python , it is still widely used by--facebook,Google and Twitter.

3.1 data sets and models

you can do it in its the list of popular datasets to load is found on the GitHub cheatsheet page. In addition,Facebook has released an official code for the implementation of the Deep Rest Network (resnets) and uses a pre-trained model to fine-tune its own datasets.

3.2 Audience and learning curve

used in the market the number of engineers in the Lua language is far less than Python. However,the Torch syntax reflects that Lua is easier to read. Active Torch contributors Love Lua, so this is a great choice for beginners and those who want to expand their toolset.

3.3 use case

Facebook created deeptext with Torch, which allows users to share information about their sites on a per-minute, and provide more personalized content targeting. Twitter , supported by Torch, has been able to recommend tweets based on algorithmic timelines, rather than in reverse chronological order.

4.scikit-learn

Scikit-learn is a high-level framework for supervised and unsupervised machine learning algorithms. As part of the Python ecosystem, it is built on the NumPy and SciPy libraries, each of which is responsible for lower-level data science tasks. However, when NumPy processes numerical calculations,the SciPy library contains more specific numerical processes, such as optimization and interpolation. Subsequently,Scikit-learn was used for machine learning, and the relationship between these three tools and other tools in the Python ecosystem reflects the different levels in the Data science field: The higher the hierarchy, the more specific the problem can be solved.

4.1 datasets and Models

The library already contains some classifications and standard datasets for regression, although they cannot altogether represent the real situation. However, the diabetes data set for measuring disease progression or the iris job data set for pattern recognition can explain how machine learning algorithms work in Scikit . Furthermore, the library provides information on loading datasets from external sources, including sample generators for tasks, such as multi-class classification and decomposition, as well as recommendations for the use of popular data sets.

4.2 Audience and learning curve

Although as a powerful repository,Scikit-learn is focused on ease of use and documentation. It is a tool that non-expert academics and novice engineers can manipulate, because it is simple to use and contains a large number of well-described examples, and enables machine learning algorithms to be applied quickly to data. Based on reviews from the software store aweber and yhat ,Scikit is ideal for projects that have time and human resources constraints.

5.caffe/caffe2 : Simple to use and with a large number of pre-trained models

different from what was born for research. Theano and Torch, Caffe arenot suitable for text, sound, or time series data. Caffe is a dedicated machine learning library for image classification. Support from Facebook and the recently open source Caffe2 has made the library a popular tool for 248 GitHub contributors.

Although it has been criticized for its slow development, Caffe 's successor , Caffe2 , eliminates the problems of legacy technologies by enhancing flexibility, weightlessness, and support for mobile deployments.

5.1 datasets and models

Caffe encourages data sets from industry and other users. The team fosters collaboration and links a number of popular datasets that are pre -trained by Caffe . The biggest advantage of the framework is the model Zoo - which includes a large number of pre-trained models created by developers and researchers that you can use, combine models, or just learn and train your own models.

5.2 Audiences and learning curve

The Caffe team claims that you can skip the learning section and start exploring deep learning directly using existing models. The target audience for this library is the developers who want to experience deep learning first-hand and commit to promoting community development.

5.3 use case

by using state-of-the-art convolutional neural Networks ( CNNs)-- Deep neural network is successfully applied to visual image analysis, even the visual effects of autonomous driving. Caffe helps Facebook develop its real-time video filtering tools to apply the famous art style to the video. Pinterest also uses Caffe to extend the visual search functionality and to allow users to discover specific objects in the image.

Source: Network

How do I choose an open-source machine learning framework?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.