Bean Leaf: machine learning with my academic daily

Source: Internet
Author: User
Tags svm

Preface

Tonight I took a bean leaf in the knowledge of the hosted Live: machine learning with my academic routine.

The purpose of my participation is that I want to know how the machine learning has a certain effect of peers, how to do the academic, how to learn the subject.

Take part in this Live, come back to the conclusion, the harvest is really quite a lot.

Background

Bean leaf, graduated from Zhong Ke Mathematics department. He turned out to be a computer Zhong Ke and later transferred to the mathematics department. Because he thinks the knowledge on the computer can learn by reading by oneself, watching the video, but mathematics knowledge needs to follow the teacher to go, do the problem practice, step by step.

He went to Hong Kong for two years and then quit directly, applying for PhD from the American University, and now he has just been a doctor for four years.

Contents mathematics and programming are two important foundation

Mathematics, programming are two very important foundation, later engaged in any industry, will be handy. Bean Leaf was glad that he had chosen mathematics major (he transferred from computer science to mathematics major).

Machine learning has many directions.

Machine learning inside, especially in industry. Machine learning is dismembered into many directions, for example, some people do data processing, some people specialize in modeling, some people do the implementation of algorithms, these in the recruitment time have different title. If you want to do machine learning, to understand the direction of their career to develop, is to do engineering? Modeling Or do you want to do a Data analysis with the customer?

It's not a place to learn machine learning.

Want to learn machine learning should not be to know, in the knowledge of machine learning in the value of the problem, in his opinion is particularly rare, it is better to read more books, textbooks, more valuable.

The dishes on the menu only know the taste if they make it.

Just beginning to learn machine learning, read some machine learning articles, textbooks, formulas can be pushed to come out one step at a time, the whole book after reading, I clear the book of every formula, spent a lot of energy to read every formula in the book.

But take a new article, and confused, do not know what this article is related to the previous learning, or a new algorithm to come, do not know how this algorithm and the previous learning algorithms, models have any connection. His own laboratory of the younger sister has this experience, he is very understanding.

He made an interesting analogy here:

Like cooking, cook, look at books, papers such as reading recipes, look at the recipes to see more, also do not know which recipes are delicious, only the dishes of each recipe to do out, their own taste of the dishes to know which recipes delicious.

At the same time, if you do not make these recipes, you can not distinguish, which is the Hunan cuisine, which is Cantonese. may not pay attention a lot of details. Do not know the different styles of cooking methods, they will produce what kind of flavor.

Just Learning machine learning, just like watching recipes, do not understand the flavor behind the recipe is what. When you see more, you put these recipes to realize, have a certain experience. When you meet a new recipe, you won't be able to do it in a scripted way. Will understand, this place how much sugar does not matter, that place how much salt is not too important.

In this realm, being able to make relationship is the similarity between the different models.

For example, now may be deep learning compare fire, but a few years ago everyone is doing probabilistic graphical Model, that may know some, such as Conditional Random fields. At this point I would ask, what is the simplest model of Conditional Random fields? Or Markov Random fields, can you give me one of the simplest models?

You may answer me: Oh, Conditional the simplest model of the Random fields is the Bayesian Linear models. Then I will ask, what is the relationship between this Bayesian Linear Model and Ridge Regression?

This is the brain hole that the beanstalk leaves just at any time, if there is a big picture in this field, you are actually very easy to answer these questions. But if you're just a recipe, a recipe, you might not be able to answer these questions, you might be struggling. So the bean leaf emphasizes the importance of a good foundation. Once you have mastered the basics of mathematics, your understanding of these models can easily transcend the formula itself.

The difference between deep knowledge and shallow knowledge

Bean leaves think that when we learn knowledge, we should learn to differentiate, what is deep knowledge (knowledge), what is shallow knowledge (shallow knowledge).

Some knowledge is shallow knowledge, only need to remember to know. But deep knowledge is needed to be mastered in practice .

For example, the machine learning that Andrew Ng opened on Coursera. The above is a lot of machine learning in some of the commonly used Model, bean leaf think these are shallow knowledge. Because for a person with a mathematical foundation, this course is not needed for one hours at the end of the class (of course, it does not include exercises). Because most of the content involved is basic mathematical knowledge.

For example, neural Network, in which the back propagation, is a derivative process.

And as the logistic Regression, there are some statistical knowledge, back to see this Logistic Regression, not very complicated.

Choose your own direction

In addition to the foundation mentioned above, there is how to choose one of their own planning. Are you doing machine learning in the direction of research? Or do machine learning things from the direction of industry? This is very different.

Because machine learning has a lot of branches. Follow a different mentor and do something completely different. Different branches, the difference will be very large.

If you are going to industry, in fact industry commonly used that set not much, you go through the game, do the project, you can learn this knowledge.

How to learn this: to participate in a game like Kaggle

Then what knowledge to learn, how to learn this knowledge, bean leaf in the knowledge there is a high-ticket answer: machine learning, data mining in the postgraduate stage about what to learn?

Basically, starting with the classification model and the Regression model, you don't really have to understand what these models are doing, and you don't know what the model is doing, even if you read the math formula. The more you get through the game, use these model, know what the model has output, what kind of performance.

And the bean leaves suggest that we use these model in a fancy way. You can also do all kinds of combination, each kind of combination have what kind of result, you can go compare these results, this process is very interesting.

Bean leaf recommend us to do Kaggle such a game, competition is very fierce, such a game to the top 10, the first 20, to spend a lot of energy.

At the same time, to participate in such a competition, he should be regarded as a game, rather than exercise as a body. If you take it as the Olympic Games, you will be killed to win medals, in that mood to participate in such a game, you will do very well. If you think of it as going to the gym, the equipment or something, that level of exercise, you will not have any growth, you know what, the game is still only know those.

Reflection on model after practice

When you have some practical experience, for example, you have used the classification model to do some data. You should think in turn, what is the nature of the model you have used? How will these properties affect the performance of this model?

There is a need for some statistical knowledge, which requires some background of computational mathematics. How much is the computational capacity of these models? Where is the core computing bottleneck? is matrix decomposition, or is it optimized? These all need you to have a mathematical background in that.

For example, the simplest, you want to use the Gaussian model to estimate a set of data. From this set of Gaussian model to sample data, this time the computational amount is on the matrix decomposition. If you do a Logistic regression, the computational amount is optimized.

When you can clearly plug different models into different toolbox, such as optimization, statistics, numerical mathematics, you have a clear understanding of what mathematical knowledge each model requires, it is a new realm.

For example, did the bean leaf ever answer the difference between the Logistic regression and the Linear SVM? Many people may know what a mathematical formula is. Many people may find Linear SVM derivation somewhat complicated. If you have learned convex optimization before, know convex optimization Some basic theory, that look back Linear SVM, or compare straight forward.

There are two levels of understanding of the above model:

    • The first level, you can understand the mathematical formula of these models.

    • The second level, further in-depth thinking, what are the properties of these two mathematical models? These are both linear models, what are the difference? At this time you need to have in-depth thinking, the model input, output has a dialectical thinking. As to what kind of transform the model input is, there is no effect on the output of the model, and the output of the model will not change. What does this model compare to input sensitive? These can both be obtained through practice, and can be obtained by mathematical analysis.


Reflection on the model: A case study of SVM and decision Tree

For example, few people in industry actually use SVM models. What is this for?

Because the SVM itself is a geometric model, Geometry model, this model relies on some assumption, you need to define kernel between Instance and Instance, or this similarity. You don't know this thing by itself. You are in a simple case, such as Linear SVM, you can assume that similarity is the inner product. But in real life a complex problem, your characteristics are from the various channel, such as age, gender, have a variety of characteristics. How do you do similarity from such a complex trait? So few people use SVM to classify such characteristics.

In fact, SVM commonly used in computer Vision inside the HoG features, LBP characteristics. Why are these features suitable for use with SVM?

Because these characteristics are themselves a histogram, each dimension is similar, is the same nature of things. So you don't need to define a particularly complex kernel, you can apply the SVM model. However, in the actual industrial problems, you seldom encounter the characteristics of the same nature, such as phonetic features, image features, text features. This time you use SVM is very strange, because you can not find a reasonable kernel.

What kind of models do we have at this time? such as decision Tree, Random forest,gradient Boost tree. The benefit of these models, for each dimension, is that it does not depend on the distribution of data in the dimension, but only on the order of the data, and you make a transformation of the data, as long as it is a monotonous transformation, as originally a>b Right now F (a)>F (b) 。 So the model output of the decision tree, the output of the random forest, the output of the gradient boost tree, its results are consistent.

This is why everyone in the industry uses decision tree, or some model of decision tree, because it is less sensitive to the distribution of the data. Compared to SVM, SVM model is very sensitive. These are actually related to the principle of decision tree, its discriminant condition is very simple, it is to set a threshold, less than this threshold to one side, greater than this threshold to the other side, with the data distribution Does not have a direct relationship.

Reflection on the model: A case study of Linear SVM and Logistic regression

Just cited are examples of Linear SVM and decision Tree. So here's a talk about the relationship between Non-linear SVM and Logistic regression.

In practical application, the general Logistic regression will be applied to some high dimensional problems.

This is because, on the one hand, compared to the non-linear svm,logistic regression, the computational speed is faster and easier to scale to larger data. The non-linear SVM is actually calculated as a square relationship with the amount of data, and its complexity is square. It is difficult to use non-linear SVM on large scale data. If the data dimension is thousands of OK, if it is millions of dimensions of data (advertising recommended) This time can only use the Logistic regression, plus some penalty.

Reflection on the model: taking Lasso and Ridge regression as an example

Another example, when to use Lasso (L1 mold to do regular items), when to use the Ridge regression (L2 mold to do regular items)?

Most models use Ridge (L2) to do regular items, which are very common, and most of the time they are better than Lasso.

But why, but in academia, you will see a lot of people are using L1 mold to do regular items, that certainly L1 mode method is better than L2 mold.

What difference does it take to think about them outside of the textbook and in practice? What are the factors that lead to these differences? If you take this way of thinking to learn, you can learn the knowledge outside the book. And this knowledge is more valuable than books, the knowledge of the model outside of these reflections, you do more, you understand the deeper, in the use, communication will be handy.


Choose the right model according to the actual demand

Really in practice, you have to understand the real world needs, can not set model, because each model has its own advantages and disadvantages. In a real problem, with different model, there will certainly be different results. What kind of model do you choose? The problem is that the bean leaf thinks it's more math. But even if you understand the model, which model is right in the real world? The problem of mathematics itself can not answer, not a verdict.

In other words, you need to make a pre-judgment on the real needs. If your model contains assumptions, how useful is this hypothesis in the real world? The bean leaf thinks this problem is an art.

Most of the time the article will have some very fancy model (formerly mention graphical model, now is the neural Network). There will be some very fancy idea, each idea will have some assumption, but the more complex the model, the more assumption. But in real-world situations, these assumption can be harsh.


Constrained optimization problems

Common unconstrained optimizations, such as Gradient Descent,newton method, or quasi-Newton methods, are available for second-order, first-rank. But the problem with constraints, how to solve.

If you read the book "Numerical optimization", there are some plans, but not all. In particular, this problem in the last more than 10 years, 20 years before people pay attention to. At an earlier time, although there are some constrained optimization methods, in practice, there is a great amount of computation.

If you read the book "Numerical optimization", it will talk about how this constraint is done if it is a convex optimization problem. It will turn a hard constraint into a soft constraint, export a Lagrange out, the Lagrange formula will introduce a dual form, many times this dual form is easier to solve than the original problem. So just ask for this dual form to do it. But this is still very limited, although this method is very beautiful in convex optimization, but it is not easy to calculate. Can solve the problem is very few, can solve Linear programming,quadratic programming, and then a more complex model with this method is very tight.

Then there are still a lot of problems, it is bound.

Like now compare fire to face tracking. It will fit your face, fit a template. It is actually an optimization problem, it needs to optimize the face orientation, face is a rotation, you want to optimize the rotation. Rotation itself has several kinds of expressions, if expressed as a matrix, it is to satisfy the orthogonal constraints. If expressed as an Euler angles (Euler angle): α 、 Beta 、 Gamma , it is also constrained, and the angle will have bounded range.

So how do you optimize for these problems? A good way to do this is to assume that your problem can be reparameterization (re-parameterized), and after you reparameterize your model, the model constraint is gone. The influence of this thought is very far-reaching, in fact a lot of standard constrained problem, after reparameterize, becomes the problem without constraint.

If you want to optimize a probability distribution, the parameter sum is 1, each component is greater than 0, and this is your constraint. You can do a log transform on it, then the problem becomes unconstrained. You can use gradient descent, quasi-Newton method to solve the problem without constraint.


Question and Answer

This part is the bean leaf answer some friend's question. I'll pick some, stick it up.

What is the difference between a statistical major and a machine learning major?

Many statistics majors now have machine learning, Statistics learning courses. What is the difference between machine learning majors and statistical majors?

Statistics majors do not care about calculations, although there are also computional statistics. There are some questions about the statistics below that do not care about application. Machine learning is also more cross-cutting, although it contains statistics, but it takes into account the actual needs of computing.


Industry is not just read master.

If you want to work on machine learning in the future, do you just need to read a master?

Look at your own positioning, many times a lot of professional, really only need master can, such as data engineering, not data scientists, do not need a PHD. And your output doesn't need a model, and you don't need a PHD.


Do girls learn a lot of machines?

The general direction of the major, girls are indeed relatively small. If it is a girl, there will be some advantage. When looking for a job, girls do have some advantages.


What knowledge and technology are needed to master

In the direction of machine learning, what are the knowledge and techniques that need to be mastered in each time period?

Machine learning is a big direction. If your goal is to go to industry, you just need to master that small amount of necessary knowledge. If I come to interview someone, if the other person does not know what is Gaussian model,ok. Do not know the random forest, also OK. Only need to direct the most basic mathematics knowledge, such as how to calculate eigenvalue, how calculate eigenvector. Or a little more difficult, how to do hypothesis testing, these basic undergraduate knowledge needs a solid grasp. The other model is a few shallow knowledge.


#

Bean Leaf: machine learning with my academic daily

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.