Begin. This is the two words that are most likely to lose their morale. The first step is usually the hardest. When you can choose the direction too much, it makes the two legs weak.
Where to start?
This article is intended to help newcomers acquire the most basic knowledge of Python machine learning through seven steps, using all the free online materials, until they become knowledgeable machine learning practitioners. The main purpose of this overview is to bring readers access to a wide range of free learning resources. There are many of these resources, but what are the best? Which complements each other? What is the best order of study?
I assume that the reader of this article is not an expert in any of the following areas:
? Machine learning
? Python
? Any Python machine learning, scientific computing, data Analysis Library
It may be helpful if you have one or all of the first two areas, but these are not necessary. The first few items in the next few steps will take more time to compensate.
First step: Basic Python skills
If you want to use Python for machine learning, having a basic understanding of Python is critical. Fortunately, Python is a popular language currently in use and incorporates the content of scientific computing and machine learning, so finding the Getting Started tutorial is not difficult. When choosing a starting point, much depends on your previous Python experience and programming experience.
The first thing to install is Python. Since we want to use machine learning and scientific computing packages, it is recommended to install Anaconda. Anaconda is a Python implementation tool that can run on Linux, OSX, and Windows, with the required machine learning packages, including Numpy,scikit-learn,matplotlib. It also contains Ipython Notebook, an interactive environment with many tutorials. Python 2.7 is recommended here, not for special reasons, but only because it is the mainstream of the current installation version.
If you have no prior programming knowledge, it is recommended that you read this free ebook before contacting other learning materials:
? Python the hard-to-do author Zed A. Shaw
If you have previous programming knowledge, but not python, or your Python level is very basic, one or more of the following tutorials are recommended:
? Google Developers Python Course (highly recommended for visual learners)
? An Introduction-to-Python for scientific Computing (from UCSB Engineering) author M. Scott Shell (a good introduction to Python Science Computing, page 60)
For those who want a crash course, here are:
? Learn x in Y Minutes (x = Python)
Of course, if you're a seasoned Python programmer, you can skip this step. Nonetheless, it is recommended that you keep the easy-to-understand Python documentation handy.
Step Two: Basic machine learning Skills
Kdnuggets's Zachary Lipton points out that people's perception of "data scientists" varies widely. This is actually a reflection of the field of machine learning. Data scientists use computational learning algorithms to varying degrees. To establish and use a support vector machine model, is it necessary to know the kernel function method? The answer is certainly not. Just like many things in real life, the degree of theoretical depth required is related to the practical application. Getting an in-depth understanding of machine learning algorithms is not the scope of this article, and it usually requires a lot of time in the academic field, or at least through intensive self-study.
The good news is that you don't have to have a PhD-level machine learning theory ability to practice, just as not all programmers have to be taught computer theory to write code well.
Wunda's courses in Coursera are highly praised. But my advice is to take a look at a note from a former student. Skip over the content for Octave (a Python-agnostic, class-Matlab language). Note that these are not "official" notes, although it does seem to capture the content of the Wunda course material. If you have time, you can go to Coursera to complete the course yourself: Andrew Ng's machine learning course.
? Unofficial Course Notes Links
In addition to Wunda's courses, there are many other video tutorials. I'm a fan of Tom Mitchell, and here's the latest course video (co-completed with Maria-florina Balcan), which is very friendly to learners:
? Tom Mitchell Machine Learning Lectures
You don't need to read all the notes and videos now. The better strategy is to move forward, to do the following exercises, and to review the notes and videos as needed. For example, if you want to make a regression model, you can go to the Wunda course for notes on regression and/or videos of Mitchell.
Step three: Scientific computing Python packages at a glance
All right. Now we have a Python programming experience and a knowledge of machine learning. Python has many open source libraries that facilitate machine learning. They are often referred to as the Python Science library (scientific Python libraries) to perform basic data science tasks (with a little bit of subjective color here):
? NumPy-primarily for n-dimensional arrays
? Pandas-python Data Analysis Library, including dataframe structure
? matplotlib-2d Drawing Library, output quality is sufficient to print the diagram
? Scikit-learn-machine learning algorithms used by data analysis and data mining tasks
To learn these things you can use:
? Scipy lecture Notes author Gaël Varoquaux, Emmanuelle Gouillart, Olav Vahtras
The following pandas tutorial is also good, close to the topic:
? Ten Minutes to Pandas
In the following tutorial you will see some other packages, such as Seaborn, a visual library based on Matplotlib. The aforementioned packages (which again admits a certain subjective color) is a core tool for many of the Python machine learning tasks. However, understanding them also allows you to better understand other relevant packages in the later tutorials.
Okay, here's the fun part ...
Fourth step: Start machine learning with Python
Python. Get.
Machine learning Basics. Get.
Numpy. Get.
Pandas. Get.
Matplotlib. Get.
It's time to use Python's standard machine learning Library, Scikit-learn, to implement machine learning algorithms.
Scikit-learn algorithm selection Diagram
Many of the following tutorials and exercises are based on the interactive Environment Ipython (Jupyter) Notebook. Some of these Ipython notebooks can be viewed online and some can be downloaded to a local computer.
? IPython Notebook Overview Stanford University
Also note that the following resources are from the network. All resources belong to the author. If for some reason you find that the author has not been mentioned, please let me know and I will correct it as soon as possible. In particular, I would like to pay tribute to Jake Vanderplas,randal Olson,donne Martin,kevin Markham,colin Raffel for their excellent free resources.
The following is an introductory tutorial for Scikit-learn. Before you proceed to the next step, we recommend completing all of the following tutorials.
For the overall introduction of Scikit-learn, it is Python's most common machine learning library, including the KNN nearest neighbor algorithm:
? An Introduction to Scikit-learn author Jake Vanderplas
A more in-depth, broader introduction that includes a novice project that uses a well-known data set from beginning to end:
? Example Machine learning Notebook author Randal Olson
Focus on strategies for evaluating different models in Scikit-learn, involving training set/test set splitting:
? Model Evaluation author Kevin Markham
Fifth step: Python machine learning topics
After laying the groundwork for Scikit-learn, we can explore more useful common algorithms. Let's start with one of the most well-known machine learning algorithms, K-means clustering. For unsupervised learning problems, K-means is usually simple and effective:
? K-means Clustering author Jake Vanderplas
Next is the classification, let's look at one of the most popular classification methods in history, the decision tree:
? Decision Trees via the GRIMM scientist
After classification, is the prediction of continuous numeric variables:
? Linear Regression author Jake vanderplas
With logistic regression, we can solve classification problems with regression:
? Logistic Regression author Kevin Markham
Sixth step: Python Advanced machine learning
Having been exposed to Scikit-learn, now let's turn our attention to more advanced content. The first is the support vector machine, a non-linear classifier, it relies on complex data transformation, the data into high-dimensional space.
? Support Vector machines author Jake vanderplas
Next comes the random forest, an integrated classifier. The following tutorials are explained through Kaggle Titanic competition.
? Kaggle Titanic Competition (with Random forests) author Donne Martin
Dimensionality reduction is a method of reducing the number of variables involved in a problem. PCA principal component analysis is a special form of unsupervised learning dimensionality reduction:
? dimensionality Reduction author Jake vanderplas
Before we start the next step, we can pause and recall how far we have come in just a short time.
Using Python and its machine learning library, we covered some of the most commonly known machine learning algorithms (KNN nearest neighbor, K-means clustering, support vector machines), and learned about a powerful integration approach (random forest), involving some other machine learning support scenarios (dimensionality reduction, model validation techniques). With the help of some basic machine learning techniques, we began to have an increasingly rich toolkit.
Before we finish, let's add a much needed tool to the toolkit:
Seventh step: Python deep learning
To learn, deeply .
Deep learning is everywhere! Deep learning is based on neural network research over the last few decades, but recent years have greatly increased the ability and interest of deep neural networks. If you are unfamiliar with deep learning, Kdnuggets has a number of articles detailing recent developments, achievements, and accolades for this technology.
The last part of this article does not want to be some kind of deep learning demonstration tutorial. We'll focus on a simple application based on two Python deep learning libraries. For readers who want to know more, I recommend this free online book:
? Neural Networks and deep learning author Michael Nielsen
Theano
Theano is the first Python deep learning Library we are interested in. According to the author,
As a Python library, Theano allows you to effectively define, refine, and evaluate mathematical expressions that contain multidimensional arrays.
The following Theano deep learning tutorials are long, but very good, detailed descriptions, with a lot of comments:
? Theano Deep learning Tutorial author Colin Raffel
Caffe
Another library we are interested in is Caffe. According to its author,
Caffe is a deep learning framework. The expression, speed and model are always considered in the development process. It was developed jointly by Berkeley Vision and Learning Center (BVLC) and community contributors.
This tutorial is the finale of this article. Although there are some interesting cases listed above, none of this is comparable to the following: Using Caffe to achieve Google's #deepdream. I hope you like it! After you understand this tutorial, have fun and let your processor start dreaming about it.
? Dreaming deep with Caffe via Google ' s GitHub
I can't guarantee that Python machine learning is quick or simple. But as long as you put in the time, follow these seven steps, you will undoubtedly have enough proficiency and understanding of this area, will use the popular Python library to implement many machine learning algorithms, even the forefront of today's deep learning field.
Author's profile: Matthew Mayo is a graduate student in computer science. He is currently working on parallel machine learning algorithms. He is also a data mining learner, data enthusiast, determined to become a machine learning scientist.
Seven-step mastery of Python machine learning turn