Algorithms, techniques, and other

Source: Internet
Author: User

After a long talk with Liu, I once again on the idea of the previous period of reflection, combined with the new feelings in the chat, organized here.
(Note: The algorithm in the title refers to the machine learning algorithm, or the "algorithm engineer" in the title of the "algorithm", not the "algorithm and data structure" in the algorithm.) Who can tell me if there are any better names to distinguish them, perhaps "machine learning algorithms" and "traditional algorithms"? )

Algorithm and algorithm engineer

First, I answered in the answer, "What is the experience of being an algorithmic engineer?" "The answer (the idea is not original, but the cottage from a Singapore University quantitative investment course PPT)

The ideal algorithm engineer: put forward the hypothesis----collect data, training model, and explain the result.
Algorithm engineer in practice: put forward, such as data collection, preprocessing, pre-processing, debugging, debugging------------ > Debug->...-> Abort.

This answer was ordered dozens of praise, in the 24 answers in the second place, indicating a certain universality. Ranked first 100+ praise, and his view is: The most important thing is to run the data every day!

This is not a joke, but a fact. Why the "tall" algorithm engineer is actually a data worker, to find the reason for this gap between the ideal and the reality, first of all to understand the fact: only people can understand the data, the machine can not.

No matter what machine learning algorithms we use-whether it's lr,svm,k-means,em--for them, the input data is a matrix of floating-point numbers (if more fundamentally, just a bunch of 01 sequences). If there is a characteristic of "hour" and it appears 25, any human with normal IQ can understand that this is a mistake and then exclude such data when the data is cleaned. But the machine couldn't understand that. To have the concept of the hour, but also understand what is the time, how many hours a day ... How can a machine automate such data cleansing work? Further, if one finds that most of the data in the "Hours" feature is 0 to 12, and a small amount of 13 (but 13 is not too small to be excluded from outliers), one suspects that the 12-hour system is used and 13 is a mistake. The machine is not able to do this at this point.

Besides, the human character. One is a feature transformation, such as the need for a feature to be a ratio of two columns of data, a division that cannot be covered by a linear model. Of course, you can increase the hypothetical space of the model, but too small to cover the required transformations, too large and easy to fit. Another is the addition of features, such as I think the clickthrough rate is related to screen resolution. So I went to find the screen resolution data to add features, if not also to find ways to collect. None of these machines can be done.

But when people get the data ready, it's time for machine learning algorithms to play. However, the main work of algorithmic engineers is not here, because the software has a feature that can be copied almost without cost. As long as there is a person in the world to achieve LR (intellectual property rights are not considered here, not to mention the many open source software), other people need to use LR can be used. Obviously, these algorithmic engineers do the same.

However, after the output of the algorithm, the need for human work-how to explain the actual problem with the results, applied to the business. Obviously this process and the previous data cleansing, human characteristics of the nature of similar, are only people can complete, machine do not do the task.

Students who have done mathematical modeling may be familiar with the process-how to describe a problem as a mathematical problem, and then how to apply the results to practical problems. This is a bit like the "last mile" problem in communication, the backbone of the fiber construction is very powerful, and the end-user access to become a nuisance. For the application problem of machine learning, the algorithm and the corresponding software packages are standardized and generalized, such as backbone network, and the data how "access", it can only be done by people. Because, only people can understand the data.

Technical and technical personnel

This problem can be generalized to the whole computer field. Instead of replacing algorithmic engineers with programmers and replacing machine learning algorithms with software, the idea is that most programmers solve the "last kilometer" access problem between a generic computer tool and a specific actual business.

Why do you say that, we first look at history: the development of computer technology for decades, the programmer's entry threshold is gradually reduced. The initial procedure is to write the assembly on bare metal. Later, with the Unix,c language, programmers did not have to dispatch the process themselves at least. After Java appears, even the memory is out of the pipeline. and (the world's greatest) after the advent of PHP, the threshold of network programming is further reduced, anyone can build a website in a short period of time.

Where did the original questions go? Solved by a handful of "wheel" handlers-those that write operating systems, compilers, virtual machines, runtime environments, frameworks ... And so on, the program guys. This trend has been going on--emerging rust, Golang and other languages are trying to solve the concurrency problems of the multicore era, with Hadoop, Spark, and Mesos trying to block the details of the underlying distributed system ... It can be foreseen that future parallel programming and distributed programming thresholds will be greatly reduced. This process is inevitable, because a technology is developed to make it easier for more people to use it.

And these computer tools can not be directly applied to the business, because the computer can not understand the human language, so there is a large number of degrees of existence, the human language "translated" into computer language. These programmers are using "wheels". Of course, this is not black or white, how much a software can be called a wheel, depends on its reusability. If a piece of code can only be used in one place, it obviously cannot be called a wheel. The fact is that most of the code written for specific business logic is poorly multiplexed.

How much of the problem is technical in the process of applying a general-purpose computer tool to a specific business? Most of the technical difficulties are solved by the operating system, the compiler, the virtual machine, and the rest is dominated by the complexity of large software (if it is a large software)-and this problem is largely the responsibility of a handful of high-level architects. For programmers who write specific code, the remaining technical difficulties are few.

I worked for a company, this is an internet company, the entire site 99% of the code is PHP, basically no java. Without a dedicated front-end engineer, PHP, HTML and JavaScript code are mixed together. The test is almost equal to No, the basic is the developer self-test. On-line process is only a form, quality control department only two of the role is to synchronize the code on the server and after the accident to the developer responsibility. I worked with another department, he called the interface I provided, and he was on the line when my interface was not online, causing an accident. I was an algorithmic engineer, writing PHP is just a cameo, and in this process, there is no on-line dependency control, even the hint is not even a person to me on-line process training. However, this is a medium-sized Internet company that has developed for more than 10 years, taking the number one share in its niche segment and has already been listed.

I'm not going to go black with this example, I'm trying to support the idea that most programmers, most of the so-called "tech" companies, face less technical problems than they think (which is probably why the company doesn't have a CTO).

This is not an individual situation, and most companies have a similar problem-from a technical point of view, it's a slag, and you'll wonder why it's not dead. But the truth is, not only does it not die, but it is alive and even on the market. The owners of the company have already realized the ideal of mating, and the abusive handlers are struggling to get a mortgage or a down payment. There are a number of non-technical factors that play a key role, though they all claim to be tech companies.

More and more people are aware of the limitations of technology. At the beginning of the year, a classmate looking for a job, he has always been a "pure technology flow" of engineers, has written a good technical blog, and even launched an open source project. But this time he said, "Do not want to do the bottom of the engineer," hope to do some "high-level, can see the project as a whole", as well as "to deal with people, able to push their ideas outward, and generate value" work. So, he went to a company responsible for bringing a few younger brother to go. When I relayed this to another student, his reaction was, "I've had this idea lately." There is also a classmate, said to write a few years C + +, technology did not learn much, but the contact is more business knowledge. Like my previous leader, he was a PhD in mathematics, had a near-naïve belief in algorithms, and when I left, he was completely transformed into business and product orientation. And a few years ago began to dilute the technology, the party when the big talk of "soft power" classmate, already in bat did team Leader, life moist, all day to run for joy. Why? In fact, the reason is simple: there are not so many technical problems in the company to be solved.

In the Code encyclopedia there is a metaphor, if your problem is to make a small nest for your dog, then do it, if something wrong, the big deal to redo one, the most wasted an afternoon time. And making a skyscraper is different. So, if you write a "kennel" level of the program, algorithms, data structure, design patterns, how much of it? Even violate the dry principle also does not matter, anyway a piece of code also copy not several times, out of the bug on Change, big deal rewrite once, most wasted one afternoon. If you are doing a "kennel" job, even if you have the technology to build skyscrapers, what is the difference between disappointing and the other? The only "benefit" is that you will be asked to ask your boss for more raises so that the boss has a "good view" of you.

Programmers should break the illusion that technology is unrealistic-not that technology is unimportant, but that it is a matter of seeking truth from facts, what is the job of making a kennel, what is building a common building, what is the work of building skyscrapers.

Another discussion on the algorithm

In the same vein, algorithmic engineers should break the illusion of algorithmic unrealistic, focusing on the understanding of data, cleaning, preprocessing, human flesh characteristics, business applications (and these are often associated with the cock silk, bitter and other adjectives) come up.

In the future, machine learning tools will be more standardized, platform-based, generalized, and further reduce the use threshold. Engineering details unrelated to the algorithm's nature, such as data storage, gradient descent, parallelization, and distributed computing, will be blocked by the "wheels" of the program workers. The algorithm engineer may simply use the hive-like approach, writing several SQL-like statements to complete the training of the model, cross-validation, parameter optimization and other work.

The only thing the machine can not replace is the understanding of the data, which is the value of the algorithm engineer's existence. And the data is strongly related to the business, the algorithm engineer will be closer to the role of the product manager, not the programmer. An in-depth understanding of data, business, and products, and the search for models and their binding points, will become the core competencies of algorithmic engineers.

In a word, deep learning is in some way an exception to the point of this article. It tries to solve the problem of feature engineering, that is, to some extent, instead of human extraction characteristics. Of course, it is also relatively elementary, and it can only solve the feature transformation problem, still cannot handle the data cleaning and preprocessing need to use domain knowledge of the situation.

Here Liu classmate raises a question, that is how the algorithm engineer needs to understand to the algorithm to what degree? The fact is, even from the application of the algorithm, engineers also need to master the advantages and disadvantages of the model, applicable scenarios, model selection, parameter tuning technology. This is no doubt, from this point, the algorithm engineers need a certain degree of technical ability, and this is different from the product manager.

But there is another question: is the model selection and parameter tuning techniques generic? Or is it highly correlated with specific data? For example, is there such a phenomenon that the same tuning technology is doing well on (for example) e-commerce data and not on social data? I have no answer to this question for the time being, if anyone knows, please let me know. However, a phenomenon is that the current machine learning model related projects, in the improvement of the time, basically the use of trial and error, is to make changes, and then on-line observation effect, if not good, the other way, if the effect has improved, also often no one knows why. If there is a common model of judging the pros and cons of the technology, why should we take such a near-exhaustive way?

From "It elite" to "IT Workers" or "code farmers", this change of name is not a joke, but a pragmatic response to the threshold of computer programming gradually reduced process. So, we should give a similar nickname to "algorithmic Engineer" or "data scientist" that sounds tall, such as "Data Worker", "farm" or "Shannong", lest the children of unknown truth be attracted by the title of "Tall" and go astray.

Other

See I am a relatively pure technical staff, because of non-technical things, I know enough, can not say what it is, only with the word "other" to summarize.

This "other" is basically a "human" problem--such as "How to push your own ideas", "soft power", and so on, which includes opportunities, small to "who should send emails to".

Of course, if you are a person interested in the technology itself, these discussions do not apply, because for such people, technology itself is an end, not a means. The angle of view here is only a career development perspective in the general sense of society. Whether you want to get a promotion within your company, get promoted by job hopping, or start your own business to achieve your goals, technology is just one of your skills. If you think about what the majority of companies offer in a "kennel" level, how much does this skill work?

However, it is quite absurd to ask programmers to be "interested in technology" and even "to write code for leisure" in their spare time. When recruiting salespeople, no one has ever asked for job seekers to be "interested in drinking"; when recruiting financial staff, no one is asking "to be interested in adding or minus numbers"; When recruiting surgeons, they will never ask "to dissect the human body for pleasure". Why do programmers need to be special in this profession?

The reason for this is that we are still immersed in a non-rational cult of technology (of course worship and blasphemy often coexist)-the phrase "technology changes the world" is often mentioned. This is true, but to make it clear that "technology changes the world" does not mean "every technology can change the world", nor does it mean "every technical person can change the world". In fact, the programmer's line is no different from any other industry that requires professional skills, just a means of earning a living.

Most of the so-called "technology companies" are not really technology companies, at best, "companies using technology." In fact, in the financial field, the demand for it is much higher, the major banks also have their own software development department, but no one put them into the IT industry, but belong to the financial industry. However, those who open stores, open hotels, sell houses, give people matchmaking, raise money, they seem to just make a website, it became a "technology company", it is not very absurd? (Of course, like Amazon, from a book-selling start-up, but actually started the cloud computing, recommender systems, unmanned aerial vehicles and other technological innovation, not the case. How much does technology really work in these companies?

Perhaps a considerable part of the programmers think that technology is very important, they immersed in the vision of technology and faith, deep in the heart firmly believe that they can through the technical ability of ascension, to seek higher positions, to the pinnacle of life. However, most of the time this is just a self-deception fantasy. Celestial programmers have a contradictory mentality, claiming to be "migrant workers", thinking that programming is only suitable for young people before the age of 30, but on the other hand, it is very important to see the technology, and even in their spare time like to talk about technology, or to attack other programmers to use the technology for fun. It is not because of how much they love technology, but because they only know the technology. No one wants to show their weaknesses in front of others. The higher the status of technology, the more important it seems to be, and the lower the ability to show when people need to communicate, push their ideas, and talk to product managers about their needs.

This is the weakness of human nature-blind and overly confident in one's ability, or even as a spiritual pillar of oneself. Maybe he's good at this, but he's more self-assessment than he really is. True, self-confidence is necessary, but also is one of the spiritual basis for survival and foothold. But self-confidence is a double-edged sword-unrealistic self-confidence (perhaps called arrogance), which blinds the eyes and distorts the truth.

So what should we do? First of all, the pessimistic point is that if you are engaged in a technically "kennel" level of work, then unfortunately, improve their level of technology may not be helpful for career development within the company.

If you're a real tech-interested person, consider the "real programmers" that are mentioned in hackers and painters: they ask for a "daytime job," a job that only survives, while writing "really valuable" code in their spare time.

I'm afraid most programmers aren't interested in technology, are they? If your goal is career development, it is a mix-up and entrepreneurship two ways. Mix with people, either in-house promotions or through job-hopping. The former needs the boss to think you are cool, the latter needs the other company's boss to think you are. Note that there is a key word "think", because there is always a gap between the subjective impression and the objective fact, and this gap is often beyond the imagination of people. So the focus is to create a "good" impression, and in fact, the cow is not the most important, the better, not the best can.

If you want to take a technical route, consider looking for a "build building" level of work, where technology is a major factor in determining the success of a product. This level of project, generally only large companies do. The boss of the staff technical ability assessment, or relatively easy to achieve objectivity, because the code is there, the cow is not good, a run will know. But for the so-called "soft power", often is not good judge, subjective is very big.

Because of this, often a lot of people think they are very good, and the boss does not think so (wrong is not necessarily the boss, may be this person arrogant), so rage embarked on the road to entrepreneurship. Oneself as the boss, finally do not care about the boss's impression and the difference between the facts. However, this road is often more difficult, it has a higher demand for the comprehensive quality of people. If a programmer does not work well with colleagues, it is difficult to imagine that he can meet the various qualities that entrepreneurs need. So, to go this way, you have to be psychologically prepared.

Summarize

Technology is a service for people, the development process of IT industry, is to gradually reduce the use of the threshold of the computer, so that more and more users can use this tool. This is good, but it also lowers the technical content of the programmer's profession. If you really want to do the technology, then do some real technology. Otherwise, you need to pay more attention to things other than technology, simply hope in technology, can only be used to comfort themselves, but not to achieve real career development.

Algorithms, techniques, and other

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.