Why do statisticians and machine learning experts solve the same problem differently?
Nir Kaldero
At first glance, machine learning and statistics appear to be very similar, with little emphasis on the differences between the two disciplines. Machine learning and statistics have the same goal-they all focus on data modeling, but their use is different because of their cultural differences. In order to better collaboration and knowledge innovation, it is necessary to understand the differences in the cultural contours of these two disciplines. To get a better understanding of these differences, we need to look back at their historical roots.
This article is compiled by several leagues-Totoro, welcome reprint, please specify the following information, thank you very much ~
Article source: The community of several leagues
Article Link: Why do statisticians and machine learning experts solve the same problem differently?
Source: Why a mathematician, Statistician, & Machine learner Solve the same problem differently
A brief history of machine learning and statistics
The successful development of the first computer system, ENIAC, in 1946, brought about a great deal of change-using machines for numerical calculations (rather than manually using pencils and paper for numerical calculations). The idea of people at that time was that human thought (human capital input) and learning methods could be converted into logical formats to be put on the machine to run.
In the 50 's, Alan Turing, the father of artificial intelligence (AI), presented a test method to measure the degree to which the machine was learning and behaving like a human being. Over the next 10 years, Frank Rosenblett the concept of Perceptron in the Cornell Aviation Laboratory. The central idea of this revolutionary concept is that perception is similar to linear classification. He points out that with a lot of perceptron we can create a powerful network model, which is what we now know about neural networks.
The study of machine learning has evolved into an area where sophisticated computer engineers are working to explore whether machines can learn and imitate the human brain. Machine learning is now applied to the value found in data, and is used in countless applications.
The statistical field began around the middle of 17th century. The main idea of this subject development is the measurement of uncertainty in experimental and observational sciences, which is the basis of probability theory. From the outset, statistics provided tools not only to "describe" the phenomenon, but more importantly to "interpret" the phenomenon.
Interestingly, beer has a profound effect on the development of statistics. A basic concept in this field: the T-Statistic, which was presented by a chemist, explains the difference in the volume of Guinness beer sold in Dublin and Ireland. The T-Statistic and other concepts promote the development of structural mathematical theory, which gives a definite definition and law. Statisticians have developed a number of tools for people to use, improving their ability to observe, arrange, predict, and sample.
The difference is the culture
Capturing phenomena in the real world is about dealing with uncertain things. To do this, statisticians must understand the potential distribution of the data being studied and identify parameters to provide predictions. The goal of statisticians is to predict the interplay of a series of variables that have a certain regularity (we can't be 100% sure of anything). In another area, machine learning experts set up algorithms that can accurately predict, classify, and cluster. Instead of focusing on uncertainty and assumptions, they improve the accuracy of results through continuous learning.
The following shows the cultural differences between machine learning and statisticians in the way they are handled:
Why should we care about these differences?
Look at the data essence, focus on several leagues
For better, smarter decisions
An in-depth understanding of the differences between the cultural and professional terminology of these two disciplines will lead to more efficient communication. Better communication will certainly lead to better collaboration, which will improve the team's decision-making.
Many times, a statistical or machine-learning professional will think that other people are thinking the same thing. Peter Novig, head of Google's research department, has done an counterproductive experiment.
Novig, a statistician with a Stanford University, wants to prove that statisticians, data scientists and mathematicians have the same way of thinking. They guessed that if these people received the same data set and then processed them independently, and eventually returned the results uniformly, they would find that everyone used the same method. So they got a very large data set and shared it with the people they had chosen beforehand.
As a result, data scientists used the entire data set and built a complex predictive model.
Statisticians sampled 1% of the data set, discarding other parts, and validating that the data met certain assumptions.
Mathematicians, believe it or not, don't even look at the data set. It proves the characteristics of various formulas that can (theoretically) be applied to the data.
The experiment failed to prove that people work in the same way in the data world, but it makes people realize how important communication is if people in these disciplines want to work together.
Narrowing the gap
Knowing the people you are talking to and understanding their cultural backgrounds can fully expand our knowledge and even flexibly apply methods outside of the professional domain. This is the concept of "data science" itself, and its purpose is to compensate for this gap. The two interesting same data-driven disciplines: machine learning and statistics, good collaboration and communication between them can make us make better decisions and ultimately have a positive impact on the way we work.
About
Nir Kaldero is the Director of Data Science and the Head of galvanize experts, Galvanize, Inc.,. Nir also serves on the Faculty of the Master's of Science in Data Science, powered by the University of New Haven.
Dr. Donatella Taurasi is a lecturer and a scholar at Haas School of business and the Fung Institute for Engineering Leader Ship in Berkeley, and at Hult International Business School in San Francisco.
Why do statisticians and machine learning experts solve the same problem differently?