http://blog.jobbole.com/67621/
This article by Bole Online-xiaoxiaoli translation. without permission, no reprint!
English Source: Jason Brownlee. Welcome to join the translation team.
There are many ways to learn machine learning, and most people choose to start with the theory.
If you're a programmer, you've mastered the ability to split the problem into components and prototype small projects that can help you learn new technologies, libraries, and methodologies. These are important abilities for any professional programmer, and now they can be used in the learning machine.
To effectively learn machine learning you have to learn the theory, but you can use your interest and knowledge to motivate you to learn from practical examples and then step into the mathematical understanding of the algorithm.
Through this article you can learn four ways that programmers learn machine learning. This is a practical method designed for technical personnel, and based on experiments, you need to do research and complete the experiment to build your own perceptual knowledge.
These four methods are:
- Learn a machine learning tool
- Learn a machine learning data set
- Learn a machine learning algorithm
- Implement a machine learning algorithm
You should read through the strategies of these methods and choose the one that you feel is best for you and perform selectively.
1. Learn a machine learning tool
Choose a tool or library that you like, and learn to use it well.
I recommend that you start learning from a working platform that comes with data preprocessing tools, machine learning algorithms, and the ability to render results. Learning such a work platform will make you more familiar with the whole process of machine learning from beginning to end, which is more valuable than learning a particular data processing technology or a machine learning algorithm.
Or maybe you're interested in a particular technology or a class of technologies. You can take advantage of this opportunity to learn more about a class library or tool that provides these methods, and to master the libraries that provide these techniques to help you master the appropriate technology.
Some of the strategies you can take are:
- Compare some of the optional tools.
- Summarize the ability of the tool you have selected.
- Read and summarize the documentation for this tool.
- Complete the text or video tutorials for learning this tool, and summarize what you have learned in each tutorial.
- Make a tutorial on the features or features of this tool. Choose features you don't know well, write down the results, or take a five-minute screenshot of how to use the feature.
Some of the working platforms worth considering are: R, Weka, Scikit-learn, waffles, and orange.
2. Learn a machine learning data set
Select a data set and then dig deep into it to discover exactly which type of algorithm is best for dealing with it.
I recommend you choose a medium-sized, memory-capable set of data that may have been researched by many people. Now there are a lot of very good class libraries that contain data that you can browse and choose from. Your goal is to try to understand the problem behind the data set, its structure, and what kind of solutions are best suited to this problem.
Use a machine learning or statistical work platform to study this data set. This way you can focus on the questions you're going to study on this data set, instead of distracting yourself from learning a particular technology or writing code to implement it.
Some strategies that can help you learn about experimental machine learning datasets are:
- Clearly describe the problem that this data set presents.
- Use descriptive statistics to summarize the data.
- Describe the structure you observe from the data, and propose assumptions about the relationship between the data.
- Simply test some of the commonly used machine learning algorithms on this data set, and then discover which classes of algorithms perform better than others
- Adjust the parameters of a well-behaved algorithm, and then discover what algorithm and algorithm parameters are set to perform well on this issue
You can choose from these libraries with high quality datasets: UCI ML repository,kaggle and data.gov.
3. Learn a machine learning algorithm
Choose an algorithm to understand it deeply, and discover what parameter settings are stable on different datasets.
I recommend that you start with a medium complexity algorithm. Choose one that has been fully understood, there are many optional open source implementations, and you need to explore an algorithm with a small number of parameters. Your goal is to build intuition about how the algorithm behaves in different problems and settings.
Use a machine learning platform or class library. This will allow you to think of the algorithm as a "system", concentrating on its performance, rather than focusing on a mathematical formula description or related paper.
Some of the strategies you can take to learn your chosen machine learning algorithms are:
- Summarize the parameters of the system and how they might affect the algorithm
- Select a series of databases that are suitable for this algorithm and may result in different performance
- Select the parameter settings for the algorithm that you think will lead to different results, and then list the possible performance you think the system might be
- Consider the performance of an algorithm that is monitored during an iterative process or at different time periods
- Design small experiments to solve specific problems with one or more datasets, algorithm settings, and result measurements, and report results
You can learn simpler, or you can learn to be more complicated. To learn a little more, you can explore so-called heuristics or rules of thumb to use algorithms, and experiment-based to show them how they work, and if so, under what conditions they relate to successful outcomes.
Some of the algorithms you can consider to learn are: least squares linear regression, logistic regression, K nearest neighbor classification algorithm, perceptron algorithm.
4. Implement a machine learning algorithm
Choose an algorithm, then choose a programming language to implement it, or port an existing implementation to the programming language you selected.
You should choose a medium-complexity algorithm to implement. I recommend that you study the algorithm you want to implement carefully, or choose an existing implementation that you like and then port it to the programming language you selected.
Implementing an algorithm from scratch is a good way to learn about the myriad of small decisions that must be made in order to translate the algorithm description into a workable system. By repeating the process on different algorithms, you will soon be able to feel the mathematical description of the algorithm in the paper and the book.
Five strategies to help you start from scratch the machine learning algorithms are:
- Start with the code migration. Porting an open source algorithm from one language to another can teach you how the algorithm is implemented, and you can own and master it. This is the quickest way to start learning and it is highly recommended.
- Start with an algorithm description and then collect some other descriptions to help you disambiguate and understand the main reference material.
- Read more about the different implementations of the algorithm. Learn how different programmers understand the description of an algorithm and how to convert it into code.
- Do not fall into the dark way too deep. Many of the kernel of machine learning algorithms use advanced optimization algorithms. Do not try to re-implement these methods unless this is what you intend to do with this project. You should use a class library that provides an optimization algorithm, or an optimization algorithm (such as a gradient descent algorithm) that is easier to implement or that has simple points in the class library.
Small Project methodology
The above four strategies belong to what I call a "small project" methodology. You can use this method to quickly build practical skills in the field of technology, such as machine learning. The idea is that you design and personally complete small projects that solve specific problems.
Small projects should be small enough in several ways to ensure that you can complete them and learn from them, then step into the next project. Here are some of the limitations you should consider adding to your project:
- short time : A project from the beginning to the end can have a result should not exceed 5-15 hours. This allows you to complete a small project in the evenings and weekends when you are not working for a week.
- Small Range : A project should make sense, but it should be the smallest version of the problem you are interested in. For example, rather than solving a generalized "write a program that tells me whether a microblog will be forwarded", it's better to study the performance of a particular account within a specific time period.
- Less resources required : a project should be able to be done with your networked desktop or laptop computer. You should not need wonderful software, network architecture, or third-party data or services. You should collect the data you need, read it into memory, and use open source tools to solve your little problem.
Additional Tips for the project
The principle of these strategies is to allow you to take advantage of your programmer skills to start acting. Here are three tips to help you adjust your thinking patterns and start your action:
- write down what you have learned. I recommend that each step produces a tangible labor outcome. It can be notes, tweets, blog posts, or open source projects in a notebook. Each labor outcome can be used as a milestone or anchor.
- do not write unless the project is intended to write code . This is not so obvious, but it is the best advice to help you speed up your understanding of machine learning.
- The goal is to learn something, not to create a unique resource. Don't care if anyone reads your research, tutorials, or notes about an algorithm. These are your opinions, the fruits of your labor, and they prove that you are now in possession of knowledge.
Summarize
Here's a clear summary of these strategies to help you choose the one that's right for you.
- Learn a machine learning tool: Choose a tool or class library that you like and learn how to use it well.
- Learn a machine learning DataSet: Choose a DataSet, go deep into it, and discover what kind of algorithms are most effective at dealing with it.
- Learn a machine learning algorithm: Choose an algorithm, understand it in depth, and discover what parameter settings are stable on different datasets.
- Implement a machine learning algorithm: Select an algorithm to implement it in your chosen language or to port the existing implementation to the language you selected.
Choose one!
PDF instruction Manual
If you like this self-learning strategy article, the author creates a 32-page PDF instruction manual for learning and practicing application machine learning. Look here:
Small Project methodology: Learning and practicing application machine learning
The author also created a list of 90 project ideas, added to this guide as fringe benefits.
Four ways programmers learn about machine learning