Transferred from: http://mp.weixin.qq.com/s?__biz=MzI3MTA0MTk1MA==&mid=2651987052&idx=3&sn= b6e756afd2186700d01e2dc705d37294&chksm= F121689dc656e18bef9dbd549830d5f652568f00248d9fad6628039e9d7a6030de4f2284373c&scene=25#wechat_redirect
1.Yann Lecun,facebook AI Research Director, New York University professor
Backprop
2.Carlos Guestrin, machine learning Amazon professor, Dato CEO
The most concise: perceptron algorithm. It was invented by Rosenblatt and others in the 1950 's. This extremely simple algorithm can be seen as the basis for some of the most successful classifiers today, including support vector machines and logistic regression, and using random gradient drops to solve problems. The convergence of the Perceptron algorithm proves to be the simplest mathematical work I have ever seen in the field of machine learning.
most useful: Boosting algorithm, especially the boosted decision tree. This intuitive approach allows you to combine multiple simple models to build highly accurate machine learning models. Boosting is one of the most practical methods for machine learning, and it is widely used in the industry to handle a wide range of data types and can be implemented on a large scale. I recommend that when implementing the extensible boosted tree, you can get to know the Xgboost tool. Boosting also has a very concise proof.
the most significant revival: convolutional neural network deep learning. This type of neural network has emerged in the early 1980. Despite the decline in interest in it from the late 1990 to the late 2000, it has seen an astonishing revival in the past five years. In particular, convolutional neural networks make up the core of deep learning models that have great impact on computing vision and speech recognition.
The most graceful algorithm: dynamic planning (for example, Viterbi,forward-backward, variable elimination & belief propagation algorithms). Dynamic programming is the most elegant and concise algorithm in computer science, because it enables people to find alternative solutions by searching for digital space. This idea has been applied in different ways in machine learning, especially in the field of graph models, such as Hidden Markov models, Bayesian networks, and Markov networks.
The most powerful benchmark algorithm: the nearest neighbor algorithm (Nearest-neighbor algorithm). Usually, when writing a paper, you want to show "your curve is better than my curve". One way to do this is to introduce a baseline approach and demonstrate that your approach is more accurate than it is. The closest algorithm is the most easily implemented benchmark algorithm, and people often try it first, thinking that they can easily beat it and prove that their approach is great. Surprisingly, however, the nearest neighbor algorithm is extremely hard to beat! In fact, if you have enough data, the nearest neighbor algorithm is very powerful, and the method works well in practice.
3.Alex Smola, Amazon researcher, professor of former CMU
Perhaps everyone likes the perceptron algorithm best. It is the starting point of many important methods. To give a brief twos:
Kernel method (simply switch the preprocessing process of the perceptron algorithm)
Deep Network (just add more layers)
Random gradient descent (simply change the target function of the perceptron algorithm)
Learning theory (just let the algorithm update have a confrontational guarantee)
What is the perceptron algorithm? Suppose we have a linear equation in the form of:
F (x) =?W,X?+BF (x) =?w,x?+b.
We want to estimate the vector W and constant B to implement every time input Class 1 o'clock, F is always positive, and every time you enter Category 1, F is always negative. So we can do it in the following steps:
Initialize W and b to 0 (or any other value that might be better)
The data pair (x, y) is continuously iterated until the error is no longer present.
If YF (x) <0, make the following update: W+=yx, b+=y
The algorithm is bound to converge, and the time it takes depends on how difficult the problem is. (more specifically, it depends on how difficult it is to separate the positive and negative sets.) More importantly, you'll want the algorithm to go through all the errors as quickly as possible.
4.françois Chollet, Google deep learning researcher, Keras author
Matrix decomposition: It is a simple, elegant dimensionality reduction (dimensionality reduction) approach-and dimensionality reduction is the essence of cognition. Recommendation system is one of the most application fields of matrix decomposition. It's another I (starting with the 2010 processing of video data) has been involved in many years of application is the decomposition features between the inter-peer information (pairwise mutual information), or more common inter-point mutual information (pointwise Mutual Information), which can be used for feature extraction, word embedding calculations, and tag embedding calculations (This is also one of my recent papers to discuss, see note [1]) and so on.
In the convolution environment, the algorithm is capable of unsupervised feature extractor for image and video. However, it has a major drawback: it is a shallow algorithm on the basis of it. Deep neural networks can easily overtake a supervisory tag if it is available.
5.Xavier Amatriain,quora Machine Learning Director
I love simple and flexible algorithms. If I had to choose one, I would say my favorite algorithm is the integration algorithm (Ensemble), which I think is my "main algorithm". No matter what algorithm you start with, you can always use an integration algorithm to elevate it. The integrated algorithms get the Netflix award and often show outstanding performance, and they are relatively easy to understand, optimize and monitor.
If you blame me for not using a "super algorithm" such as an integration algorithm to fool people, I would choose the Logistic regression algorithm. The algorithm is simple, efficient and flexible, and it is suitable for many applications, especially classification and sorting.
6.Ian Goodfellow,openai Researcher
I like the dropout algorithm, because the idea of "building a digital-level integration from a single model" is simple and elegant. I also marvel that by dividing the weights by 2, it is good to approximate the predictions for integration. I don't know what the theoretical reason is for it to perform well in a deep, non-linear model, but it does.
7.Claudia Perlich,dstillery chief scientist and adjunct professor at New York University
is a Logistic regression algorithm (with attractive methods such as random gradient descent, feature hashing, and so on).
I know this may sound odd in this deep learning age. So let me first talk about the background:
During 1995-1998 I used a neural network, 1998-2002 I used a tree-based approach, and 2002 years later, the logistics regression (and linear models including the regression of the number of digits and Poisson regression) became my favorite. I published a machine study paper in 2003, comparing tree methods and logistic regression methods on 35 large datasets of the time.
Skipping this 30-page paper, the short conclusion is that if the SNR is high, the tree method will win. But if the noise of the problem is high, then the AUC of the optimal model will be less than 0.8, and the logistic model almost always defeats the tree model. Finally, if the signal is very weak, the highly mutated model will be lost in the noise.
What does this mean in practice? I often deal with the problem of very high noise and low predictability. Imagine that one end of a variety of problems is deterministic (such as chess), and at the other end there are random questions (such as the stock market). Given the data, some issues are more predictable than others. Predictability is not an algorithmic problem, but a statement of what the world is.
Most of the questions I'm interested in are close to random questions like the stock market. Deep learning is superior in solving the deterministic problem of "This picture is a cat". But in a world of uncertainty, the bias-variance tradeoff is often more inclined to the side of the bias-that is, what you need is a "simple", highly constrained model. This is where logistic regression is useful. I found that it is much easier to make a simple linear model by adding complex features than to constrain a powerful (highly mutated) model class. In fact, every time I win a data mining competition (KDD, CUP 07-09), I use a linear model.
In addition to performance, the linear model is robust and does not require manual adjustment (well I admit that random gradient descent and penalties do make the linear model more difficult). This is important when you want to do predictive modeling in industry, because in the industry you don't have time to spend 3 months building the perfect model.
7.Shehroz Khan, machine learning researcher, postdoctoral fellow, University of Toronto
I don't have a favorite machine learning algorithm because this algorithm doesn't exist [1]. However, the first algorithm I implemented was naive Bayesian classifier. So, in a sense it's important to me. However, it did not help me to finish the work at that time. Because I wanted to do a single category [2]. Naive Bayesian algorithm calculates its prior probability based on the frequency count of each category. However, if a single class of data does not exist during training, it will have a priori probability of 0, so that you cannot calculate its likelihood because there is no training data and nothing is classified into that category. The implementation of this work has made me realize the limitations of this simple algorithm on this particular problem.
Footnote:
[1] There is no free lunch in the world.
[2] Https://cs.uwaterloo.ca/~s255kha ...
8.Ricardo Vladimiro,miniclip game analyst and chief data scientist
is a random forest algorithm. The process of learning random forests is wonderful for me. Each integration ultimately has a meaning. Those beautiful but useless decision trees also have a reason to exist. Bootstrapping features are the most surprising, it's really magical.
I think my view of random forest is emotional, because I have learned so much from it in less time.
p.s. I know my view of the decision tree is a bit extreme.
9.Luca Parlamento, Quantitative trading/machine learning PhD candidate
The most favorite algorithm is the right algorithm to solve the specific problem!
In principle, I think a big pitfall for practitioners is sticking to one of their favorite algorithms. In practice, it is dangerous to use an algorithm that you do not quite understand how it works ...
So you need to find a balance, don't fall in love with a particular algorithm, but don't be so obsessed with the idea that you need to know 17 different classification algorithms to cut an onion. This is an optimization problem and there is no "free lunch" type of one-size-fits-all solution.
10. Let me first answer a simpler question: there are some algorithms that have unnecessary flaws that I don't like. But when it comes to favorite, there are many machine learning algorithms that are important and belong to the Pareto set in machine learning space. In fact, the basic theory of machine learning tells us that there is no single algorithm that is optimal for all problems. For example, if I have sparse data of a high dimension (for example, by topic-based categorization of text) and few training instances, then I use a normalized linear regression model such as SVM or logistic regression. But if I had low-dimensional dense data and a lot of training instances (such as speech recognition or vision), I would use a deep network.
"Turn" 11-bit machine learning Daniel's favorite algorithm full solution