Turn from: 1190000005356857
1. Preface
Originally this title I think is 算法工程师的技能
, but I think if added in the 机器学习
title, the estimated point of people will be a little more, so the title into this, hehe, and is indexed by the search engine when a more popular words, estimated exposure will be more points. But rest assured, the article is not tricky, we are serious.
Today, the 机器学习
last two years of the computer field of the hottest topic, this is not a machine learning technical article, just tell everyone 机器学习
inside the pit is too much, and many have not yet started or just getting started of friends, in fact, in front of you is a big pit, if you want to go down on this road , please be prepared mentally.
2. We learn the purpose of machine learning
To tell the truth, the majority of people in various classes to learn machine learning, learning big data, in the final analysis or hope to find a good job, get a higher salary, of course, there is a part of the reason is that they are more interested in this aspect, want to know more deeply about this field.
I personally think that the first reason is more important.
3. What are we talking about when we talk about machine learning?
First, let's see what a machine learning system looks like.
?
Almost all are made up of the 机器学习系统
above system diagram, the difference is that the supervised system training data may require manual intervention rather than the supervised system does not require human intervention, simply to give a batch of training data to this to 机器学习模型
learn, get a predictive model, and then use this 预测模型
Make predictions for new unknown data.
Now online machine learning articles, blogs are everywhere, the market all kinds of books are everywhere, and the current education is the most fire field is this, a variety of machine learning online education classes, tuition is very expensive.
But you find no, all these talk about machine learning is talking about models, what "in-depth understanding of XXX Model", "may be the best understanding of XXX article", "machine learning is not difficult, XXXX model detailed" and other articles and books everywhere. Various introductions 逻辑回归,深度学习,神经网络,SVM支持向量机,BP神经网络,卷积神经网络.....等等等等
.
So, when we talk about machine learning, we're actually talking about machine learning models, which are all kinds of machine learning algorithms. And everyone thinks that as long as the theory of model and algorithm is learned, it is the expert of machine learning. I believe most people think so.
4. Xiaoming became a machine learning "expert"
There is a child, is engaged in computer, called Xiao Ming, watched the alphago
abuse 李世石
of video, although he did not understand the go, but he was shocked, determined to study this legendary 机器学习
. So everywhere in the online search tutorials, find blog articles, find books, good study for half a year, finally feel that they started. Each machine learning model algorithm can tell a why.
Do not know how many people there at this stage?
But Xiao Ming also want to go further, so began to study various models of code and tools, Hadoop and spark that is standard, but also a variety of search articles, all kinds of books, all kinds of online classes, fortunately these things a lot of a lot of, especially now online classes, if there is no big data processing class, If you don't have a Hadoop class, don't open it.
All the way down, the big six months later, and finally Xiao Ming felt himself learned, theory also has, big data processing tools will also, simply invincible!
And how many people are at this stage? And thought he'd already learned the machine. At this stage, if you learn well, then you can have a class to teach others machine learning. But if you think so you can find a company to do algorithmic engineers, then tell you, the pattern Tucson broken, righteousness five!
Xiao Ming because of a strong theoretical knowledge, can deduce all the formula, and will Hadoop,spark, coupled with their ability to express, very easy seconds a few interviewers into a large company, is in an e-commerce search algorithm engineer, the monthly salary is high, finally can a show fist, the boss gave him a task, Use your awesome knowledge to increase your search rate by 1%.
If you were xiaoming, what would you do if you had just learned a lesson from a machine? Are you stupid?
5. Machine learning is more than just a model
The reason for this problem is that all people think that machine learning model is machine learning itself, think of those algorithms understand that is machine learning Daniel, but in fact it is not the case at all.
Who is playing the model? The model is invented by scientists, is a large company of various scientists, researchers invented, the invention will be out of the paper, they are used to abuse our IQ, under normal circumstances, you can not invent the model bar (if you can, do not look down, you can go to the academic road)? You can't change the model, can you?
所以说,学会了模型,只是刚刚刚刚入门,甚至还算不上入门吧
What are so many algorithmic engineers in each company doing? Let's take a search-sequencing algorithm engineer, for example, what are they doing? them in
Observe data---> Find features---> Design algorithms---> Algorithm validation---> Wash data---> Engineering---> On-line viewing results--->goto observation data
and a mature system, the general model has probably been determined, if the effect is not particularly bad will not change the model, such as a company's search sorting system with machine learning logistic regression model, you have to change to other models is generally not possible, then only to do some of the characteristics of the supplement.
Well, let's take a look at what a machine-learning algorithm engineer can do with this process.
5.1 Observation data
Xiao Ming every day in the station to see data, check data, look at the table, draw curves, found like sales, collection, click and so on this can think of features have been used, so consumed for three months, no progress, people have collapsed, came so long, machine learning code hair did not see it.
Fourth month, he found a little problem, he found some goods, comments and so on are very good, feel the product quality is also good, but is not on the sales, so the old platoon behind, so he put these comments are five-star, but the sales of poor products filtered out, want to see what they have in common.
What do you say you want to be able to observe the data phase? Oh, can only tell you, need 数据敏感性
, in fact, that is to tell you 需要全面的能力,需要经验,需要产品经理的能力
.
In addition to these, you also need to 能随手编脚本代码的能力
encounter some data need preliminary processing, you may need to readily code processing, and make up fast, because the code may be used one or two times, so need to compare the ability of scripting language, Python at least familiar with it, the shell will be.
5.2 Finding features
Data observation down found the problem, now to look for features, to find features, that is, what factors lead to the sales do not go, first, need 想象力
, and then to verify your imagination.
Xiao Ming's imagination, even so, also engaged for one months only to find that these products have a common feature, that is, the pictures are rotten, so that people do not want to see a point. If you can add the image quality to the sorting factor, is it a magical thing? Image quality as a feature, this has not been done before, and finally found a feature.
So at this stage, after all, everyone's imagination is limited, and more is to 经验值
find the characteristics that match the current scene.
5.3 Design algorithm
The feature is found, but how do you add this feature to the sort model? Picture good, how good, how do these machines understand? If you can't turn the image quality into a mathematical vector, you'll never be able to add it to the sort model.
This phase is a real test of algorithmic engineers, that is to quantify the characteristics, Xiao Ming observed that the more beautiful images tend to change more color, and poor quality of the picture is often no change in color, so he thought of a method, first the image data Fourier transform, into the frequency domain data, according to the nature of the Fourier transform, High-frequency part of the amplitude of the color of the image is very obvious, if the low-frequency portion is high, indicating that the color changes are not obvious, this and the observed image information can be matched, such an image of the good or bad, you can use the Fourier transform after the high-frequency portion of the amplitude of the expression, and then do some normalization changes, Vectorization can then be added to the sorting model.
This step, you may use your learning machine learning model, but certainly only a small part, most of the time you need to build a mathematical model based on the current scene, rather than the machine learning model, you say this phase requires what skills? Although the examples I cite here are extreme, but 数学抽象能力
, 数学建模能力
and 数学工具的熟练使用
are essential, and equally necessary 较强的编程能力
, this is not the script capability of the previous step, it is real deal 计算机算法编程能力
.
5.4 Algorithm Validation
Algorithm is designed, but also to design an algorithm off-line verification method to prove to your boss to see that my algorithm is effective, or so many opportunities to let you go to the line to try Ah, this step is a combination of various comprehensive capabilities, the key is in this step, you have to use a popular language in theory to persuade your boss, What kind of power is this?强大的语言表达能力。
In addition to this you need to design an on-line AB test plan, can be very good to test whether your algorithm is really effective.
5.5 Wash data
Features found, the algorithm is also designed to almost reflect the characteristics of physical activity, that is to wash the data, this is a compulsory course for the algorithm engineer, the data is not what you want to look like what he looks like, so to make the data you want to look like, and then remove the invalid data is a physical activity.
Like the above example, the first may be the size of the picture is not the same, to become a size before the transformation, some products have multiple pictures, you may need to find the best quality of re-processing and so on.
This stage is the first to script language processing capabilities, but also need to master some 数据处理工具的使用
, the key to have enough 耐性和信心
, of course, is essential is the excellent programming ability.
5.6 Engineering
All right, you've crossed all the holes in the front. To this step, hehe, algorithm design is finished, the data is ready, estimated half a year passed, then quickly put on the line up, you think with a bunch of script can go online ah, have to consider engineering, if your algorithm embedded in the original system, if you ensure that the efficiency of your algorithm, Do not run a day, the robustness of the code should also consider Ah, if it is an online algorithm, but also to consider performance, do not put the memory dry.
This step, you really use the machine you learn the Hadoop,spark tool, read the above said, to complete the project of this step, have what ability without I say, this is a standard software development engineer necessary skills, or 高级开发工程师
oh.
5.7 on-line viewing effect
All done, back and forth 10 months, finally can go online, well, the real test came, look at the effect of the line, product manager said, do a AB test bar, the results hehe, the CTR reduced, Xiao Ming Ah! This 10 months of busy down the click rate has also declined??? The boss is not going to scold you to death, so you must have strong 抗打击能力
.
Oh, hurry off the line, from the beginning to see where the problem, and spent one months to modify the algorithm, re-online, ah, this is good, click-through rate increased by 0.2%, continue to work hard, see if there is anything you can dig, so, you goto 看数据
that step.
Don't look at this 0.2, the big data collection, the increase of 0.2 is already very good to improve, so spend so much money, foster algorithm engineer, if a year can make a few 0.2, that is the truth.
6. Let's summarize
Above so many processes, rely on a person complete really a bit difficult, I said a bit exaggerated, some of the steps in the middle is someone to cooperate with the data when there is a product manager to cooperate with you, when the data is washed with the data engineer to cooperate with you, engineering when the system engineers with you, but as a machine learning algorithm engineer, You have to be able to hold on to the whole process, so even if you are alone you should be able to complete the process.
This is just a standard algorithm engineer should have the ability, of course, I here is the search algorithm for example, the other algorithm engineers are not too much, always run but the above several processes, of course, if you are a cow, can modify the machine according to the scene model of learning, and even can think of a model, it is more powerful.
OK, let's take a look at the key points above, and let's see what kind of skills an algorithmic engineer needs.
数据敏感性,观察力
数学抽象能力,数学建模能力和数学工具的熟练使用的能力
能随手编脚本代码的能力,强大的计算机算法编程能力,高级开发工程师的素质
想象力,耐性和信心,较强的语言表达能力,抗打击能力
Then, there is a very important point, you need to 聪明
, of course, if you can do the above points, basically will be very 聪明
, if you can do so, but those machine learning models, theories and tools are not so important, because those are just knowledge and tools, can learn at any time.
You said, these are by looking at several blogs, read a few books, the last few lessons can have??
Of course, we're talking about the general situation here, and if you're focused on doing research, then you need to increase the proficiency level of the above skills.
Finally, you are learning machine learning, inspirational to do algorithmic engineer you, ready to tread these pits??
What are the skills for machine learning?