Inventory the difference between machine learning and statistical models
Source: Public Number _datartisan data Craftsman (Shujugongjiang)
In a variety of data science forums such a question is often asked-what is the difference between machine learning and statistical models?
This is indeed a difficult question to answer. Given the similarity between machine learning and statistical models to solve problems, the difference between the two seems to be just the difference between the data volume and the Model creator. Here is a data science Venn diagram covering machine learning and statistical models.
In this article, I will do my best to demonstrate the difference between machine learning and statistical models, and also welcome the industry's experienced friends to supplement this article.
Before I start, let's make a clear use of the goals behind these tools. Regardless of which tool is used to analyze the problem, the ultimate goal is to obtain knowledge from the data. Both methods are designed to excavate the hidden information behind the analysis of the data generation mechanism.
The objective of the analysis of the two methods is the same. Now let's explore the definitions and differences in detail.
Defined
Machine learning: A data learning algorithm that does not rely on rule design.
Statistical model: A stylized representation of the relationship between variables in the form of mathematical equations
For those who like to understand concepts from practical applications, the above may not be clear. Let's look at a business case.
Business case
Let's use an interesting case published by McKinsey to differentiate two algorithms.
Case study: Analyze and understand the level of customer churn in telecom companies over a period of time.
Available data: two drive-a&b
McKinsey's next show is exciting enough. Look at the difference between the statistical model and the machine learning algorithm.
What do you see from it? The statistical model obtains a simple classification line in the classification problem. A non-linear boundary line distinguishes between high-risk populations and low-risk populations. But when we look at the colors produced by machine learning, we find that statistical models do not seem to be able to compare with machine learning algorithms. The machine learning approach obtains information that cannot be characterized in detail by any boundary. This is what machine learning can do for you.
Machine learning is also being applied to YouTube and Google's engine recommendations, and machine learning gives near-perfect recommendations for instant analysis of a large number of observational samples. Even with a notebook of only one G memory, I have no more than 30 minutes to model thousands of parameters a day for hundreds of thousands of rows. A statistical model, however, requires 1 million years of running on a supercomputer to observe thousands of parameters.
Differences in machine learning and statistical models:
Given the difference in output between the two models, let's take a closer look at the differences between the two paradigms, although they do work similarly.
School of Belonging
Generation time
Based on the assumptions
Types of processing data
Terminology of operations and objects
The technology used
Predictive effectiveness and human input
Each of these areas distinguishes machine learning from statistical models, but does not give a clear line between machine learning and statistical models.
belong to different schools of school
Machine learning: A branch of computer science and artificial intelligence that builds analytic systems through data learning, without relying on explicit building rules. Statistical models: Branches of mathematics are used to identify correlations between variables to predict output.
The birth age is different
The history of the statistical model has been a few centuries old. But machine learning has only recently developed. In the the 1990s, stable digitization and cheap computing made it possible for data scientists to stop building complete models and use computers to build models. This has spawned the development of machine learning. With the increasing scale and complexity of data, machine learning continues to show tremendous development potential.
Hypothetical degree difference
The statistical model is based on a series of assumptions. For example, the linear regression model assumes:
(1) The independent variable and the dependent variable linear correlation (2) the same variance (3) The mean value of the fluctuation is 0 (4) The observed samples are isolated from each other (5) the fluctuation obeys the normal distribution.
Logistics regression also has a lot of assumptions. Even nonlinear regression is subject to the assumption of a continuous segmentation boundary. Machine learning, however, escapes from these assumptions. The greatest advantage of machine learning is that there is no limit to the continuity of segmentation boundaries. Also, we do not need to assume the distribution of independent variables or dependent variables.
Data differences
Machine learning is widely used. Online learning tools can quickly process data. These machine learning tools can learn hundreds of millions of observational samples, predicting and learning in parallel. Some algorithms such as random forest and gradient boosters are fast when dealing with big data. The breadth and depth of machine learning processing data is large. However, statistical models are generally applied to smaller data volumes and narrower data attributes.
naming convention
Some of the following names refer almost to the same thing:
Formula:
Although the final goal of the statistical model and machine learning is similar, the formulation structure is very different
In the statistical model, we try to estimate the F function through
Dependent variable (Y) =f (independent variable) + perturbation function
Machine learning discards the form of function f, which is simplified to:
Output (Y)-and input (X)
It tries to find the bag of n-dimensional variable x, and the value of y between the bags is obviously different.
Predictive effectiveness and human input
Nature does not give any assumptions before things happen. The fewer assumptions in a predictive model, the higher the predictive efficiency. The intrinsic meaning of machine learning naming is to reduce human input. Machine learning discovers the science hidden in data through iterative learning. Because machine learning is not dependent on assumptions on real data, the predictive effect is very good. The statistical model is a mathematical enhancement, which relies on parameter estimation. It requires the creator of the model to know or understand the relationship between variables in advance.
Conclusion
Although machine learning and statistical models appear to be different branches of the predictive model, they are almost identical. The differences between the two models have been getting smaller through decades of development. The mutual infiltration between the models makes the boundaries of the two models more blurred.
Original link:
http://www.analyticsvidhya.com/blog/2015/07/difference-machine-learning-statistical-modeling/
Original Tavish Srivastava
Translation: F.xy
Inventory the difference between machine learning and statistical models