Getting started with Python & machine learning
(Reader Note: This is an introductory guide to machine learning, and the author outlines the pros and cons of starting machine learning with Python, and the Python package used to start machine learning.) )
Machine learning is eating. Everyone and their mother is learning about machine learning models, classification, neural networks, and Andrew Ng. You've decided you want to being a part of it, but where to start?
In this article we'll cover some important characteristics of Python and why it's great for machine learning. We'll also cover some of the most important libraries it have for ML, and if it piques your interest, some places where you Can learn more.
Why are Python used for machine learning?
Python is a great choice for machine learning for several reasons. First and foremost, it's a simple language on the surface; Even if you ' re isn't familiar with Python, getting-to-speed was very quick if you've ever used any other language with C-l IKE syntax (i.e. every language out there). Second, Python has a great community, which results in good documentation and friendly, comprehensive answers in StackOverflow (fundamental!). Third, also stemming from the great community, there is plenty of useful libraries for Python (both as "Batterie S included "and third party", which solve basically any problem that can has (including machine learning).
But I heard Python is slow!
Yeah and it ' s true. Python isn ' t the fastest language out there:all those handy abstractions come at a cost.
But here's the trick:libraries can and do offload the expensive calculations to the much more performant (but harder to u SE) C and C + +. For instance, there's NumPy, which is a library for numerical computation. It ' s written in C, and it's fast. Practically every library out there/involves intensive calculations uses it-almost all the libraries listed next US E it in some form. So if you read NumPy, think fast.
Therefore, you can make your scripts run basically as fast as straight to writing them in a lower level language. So there's really nothing to worry on when it comes to speed.
Python Libraries to check Outscikit-learn
Is you starting-in-machine learning? Want something that covers everything from feature engineering to training and testing a model? Look no further than scikit-learn! This fantastic piece of free software provides every tool necessary for machine learning and data mining. It's the de facto standard library of the "Learning in Python" recommended for most of the ' old ' ML algorithms.
This library does both classification and regression, supporting basically every algorithm out there (su Pport vector machines, random forest, naive Bayes, and so on). It's built in such a-on-the-allows easy switching of algorithms, so experimentation are easy. These ' older ' algorithms is surprisingly resilient and work very well in a lot of cases.
But that's not all! Scikit-learn also does dimensionality reduction, clustering, you name it. It's also blazingly fast since it runs on NumPy and SciPy (meaning so all the heavy number crunching are run on C instead of Python).
Check out some examples to see everything this library was capable of, and the tutorials if you want to learn how it works.
NLTK
While isn't a machine learning library per se, NLTK are a must when working with natural language Processing (NLP). It comes with a bundle of datasets and other lexical resources (useful for training models) in addition to libraries for W Orking with text-for functions such as classification, tokenization, stemming, tagging, parsing and more.
The usefulness of have all of this stuff neatly packaged can ' t be overstated. So if is interested in NLP, check out some tutorials!
Theano
Used widely in and academia, Theano are the grandfather of all deep learning frameworks. Written in Python, it's tightly integrated with NumPy. Theano allows you to create neural networks, which is represented as mathematical expressions with multi-dimensional Arra Ys. Theano handles this for you and don ' t has to worry about the actual implementation of the math involved.
It supports offloading calculations to the much faster GPUs, which is a feature so everyone supports today, but back when They introduced it wasn ' t the case. The library is very mature at this point and supports a very wide range of operations, which are a great plus when it comes To comparing it and other similar libraries.
The biggest complaint out there is and the API may are unwieldy for some, making the library hard to use for beginners. However, there is wrappers that ease the pain and make working with Theano simple, such as Keras, Blocks and lasagne.
Interested in learning about Theano? Check out this Jupyter Notebook tutorial.
TensorFlow
The Google Brain team created tensorflow for internal use on machine learning applications, and open sourced it in late 20 They wanted something, could replace their older, closed source machine learning framework, distbelief, which they Said wasn ' t flexible enough and too tightly coupled to their infrastructure to BES shared with other researchers around th E world.
TensorFlow was created. Learning from the mistakes of the past, many consider this library is a improvement over Theano, claiming more Flexibi Lity and a more intuitive API. Not only can it is used for the also for the production environments, supporting huge clusters of GPUs for training. While the IT doesn ' t support as wide a range of operations as Theano, it has better computational graph visualizations.
TensorFlow is very popular nowadays. In fact, if you've heard about a the single library on the This list, it's probably this one:there isn ' t a day the goes by Witho UT a new blog post or paper mentioning TensorFlow gets published. This popularity translates into a lot of the new users and a lot of tutorials, making it very welcoming to beginners.
Keras
Keras is a fantastic library that provides a high-level API for neural networks and are capable of running on top of either Theano or TensorFlow. It makes harnessing the full power of these complex pieces of software much easier than using them directly. It ' s very user-friendly, putting user experience as a top priority. They manage this is using simple APIs and excellent feedback on errors.
It's also modular, meaning that different models (neural layers, cost functions, and so on) can is plugged together with L Ittle restrictions. This also makes it very easy to extend, since it's simple-to-add new modules and connect them with the existing ones.
Some people has called Keras so good that it's effectively cheatingin machine learning. So if you ' re starting off with deep learning, go through the examples and documentation to get a feel for what can do With it. And if you want to learn, the start out with this tutorial and the see where you can go from there.
The similar alternatives is lasagne and Blocks, but the they only run on Theano. So if you tried Keras and is unhappy with it, the maybe try out one of the these alternatives to the see if they work out for you.
Pytorch
Another popular deep learning framework is Torch, which are written in Lua. Facebook open-sourced a Python implementation of Torch called Pytorch, which allows you to conveniently use the same low-l Evel libraries that Torch uses, but from Python instead of Lua.
Pytorch is much better for debugging since one of the biggest differences between Theano/tensorflow and Pytorch are that th E Former use symbolic computation while the latter doesn ' t. Symbolic computation means that coding an operation (say, ' x + y '), it's not computed if that's interpreted. Before getting executed it have to is compiled (translated to CUDA or C). This makes debugging harder in Theano/tensorflow, since a error is much harder to associate with the line of code that CA Used it. Of course, doing things this is the have its advantages, but debugging isn ' t one of them.
If you want-to-start out with Pytorch The official tutorials is very friendly to beginners but get-to-advanced topics as Well.
First steps in machine learning?
Alright, you ' ve presented me with a lot of alternatives for machine learning libraries in Python. What should I choose? How does I compare these things? Where do I start?
Our Ape advice™for beginners are to try and don't get bogged down by details. If you've never done anything machine learning related, try out scikit-learn. You'll get a idea of what the cycle of tagging, training and testing work and how a model is developed.
Now, if you want-to-try out deep learning, start out with Keras-which are widely agreed to be the easiest framework-and See where this takes you. After you had more experience, you'll start to see what it was that actually want from the framework:greater speed, A different API, or maybe something else, and you'll be able to make a more informed decision.
And even then, there is a endless supply of articles out there comparing Theano, Torch, and TensorFlow. There's no real-to-tell which-one is the good one . It's important-to-take-into-account the all of them has wide support and is improving constantly, making comparisons ha Rder to make. A six month old benchmark is outdated, and year old claims of the framework X doesn ' t support operation Y could n O longer be valid.
Finally, if you ' re interested-doing machine learning specifically applied-NLP, why not check out monkeylearn! Our platform provides a unique UX this makes it super easy to build, train and improve NLP models. You can either use pre-trained models for common use cases (like sentiment analysis, topic detection or keyword extraction ) or train custom algorithms using your particular data. Also, you don ' t has to worry about the underlying infrastructure or deploying your models, we scalable cloud does this F Or you. You can start for free and integrate right away with our beautiful API.
Want to learn more?
There is plenty of online resources out there to learn on machine learning! Here is a few:
- A comprehensive guide for a machine learning project on a Jupyter Notebook, if you want to see what the some code looks like.
- Our Gentle-to-machine learning, if you want-to-read more about the concepts of machine learning.
- Andrew Ng ' s Stanford CS229 on Coursera, if you ' re ready to get serious about the this machine learning thing. If you is looking for a course in practical deep learning, check out the one at Fast.ai.
Final words
So is a brief intro to machine learning in Python and some of its libraries. The important part isn't getting bogged down by details and just trying stuff out. Follow your curiosity, and don ' t be afraid to experiment.
Know about a Python library that is left out? Share it in the comments below!
ByBruno Stecanella| August 3rd| News| Comments
Python & Machine learning Getting Started Guide