Recently, a stalk about deep learning and artificial intelligence has been widely circulated on social media. They think that the two are only cracks in the wall with a new border. The irony of machine learning is just repackaged statistics, which is essentially "new." Bottled old wine." But is this really the case? This paper raises objections to this view, thinking that machine learning ≠ data statistics, deep learning has made a significant contribution to our handling of complex unstructured data problems, and artificial intelligence should be appreciated.

As the heat of deep learning began to fade, the stalk has recently become widely circulated on social media, causing ridicule on the Internet. Machine learning is nothing to be excited about, it is just a complement to statistical techniques - this view is becoming more and more common; but the problem is that this view is not correct.

I understand that being an extremely fanatical deep learning communicator is not fashionable. Even the machine learning experts who tried to make everyone know about deep learning in 2013 are only disappointed. Now they are more inclined to weaken the power of modern neural networks, lest people think of them as "import keras". The people of Wanjin Oil are confused. And they think they are quite advantageous compared to the latter.

Although, as Yann LeCun said, deep learning is no longer a buzzword, this overkill attitude has had a negative impact on the advancement, future and usefulness of artificial intelligence. This is particularly evident in the discussion about the winter of artificial intelligence, in which artificial intelligence is predicted to be stagnant for many years as it was decades ago.

However, this article does not question the artificial intelligence entering the cold winter, nor does it want to say that the progress of deep learning should be attributed to which academic team; on the contrary, it just wants to say that artificial intelligence should be appreciated by it; it develops Levels have surpassed supercomputers and better data sets; machine learning has recently achieved great success in deep neural networks and related work, representing the world's most advanced technology.

Machine learning, data statistics

"When financing, we talk about artificial intelligence; when we are looking for a job, we say deep learning; but when we do the project, we talk about logistic regression."

- Everyone on Twitter says so

The subject of this article is that machine learning is not just a repackaging of statistics—there are larger computers and better names in this area. This concept is derived from the concepts and terminology of statistics, which are very common in machine learning, such as regression, weights, biases, models, and so on. In addition, many models approximate statistical functions: the softmax output of the classification model consists of logits, making the training process of the image classifier a logistic regression.

Although this idea is technically correct, it is too early to think of machine learning as a whole as a branch of statistics. This comparison does not make sense. Statistics is the mathematical field of processing data and interpreting data. Machine learning is nothing more than a computational algorithm (born in computer science). In many cases, these algorithms are useless to help understand the data and can only help with certain types of incomprehensible predictive modeling. For example, in reinforcement learning, the algorithm may not use an existing data set. In addition, in image processing, treating an image as a data set characterized by pixels is somewhat far-fetched from the beginning.

The key to the problem is of course not whether this honor belongs to a computational scientist or to a statistician; as in other fields, success in today's success is a credit for academic disciplines, including of course statistics and mathematics. However, in order to correctly assess the great influence and potential of machine learning methods, it is necessary to break this misconception: the development of modern artificial intelligence is nothing more than the fact that ancient statistical techniques have more powerful computers and better data sets.

Machine learning does not require advanced statistical knowledge

Listen to me first, when I started the machine learning, I was fortunate to have chosen a very good course, which is devoted to deep learning. This is also part of my undergraduate computer course. One of the projects we are going to accomplish is implementing and training Wasserstein GAN on TensorFlow.

At the time, I only had a compulsory introductory course in statistics, but I quickly forgot most of the content. Needless to say, my statistical ability is not strong. However, I was able to read a state-of-the-art paper on generating machine learning models and implemented it from the ground up – by training on the MS Celebs dataset, we generated virtual images that could be faked.

Throughout the course, my classmates and I have successfully trained models for image segmentation, neural machine translation, character-based text generation, and image transformation for cancer organizations that use cutting-edge machine learning techniques that have just been invented in recent years. .

However, if you ask me or my classmates how to calculate the variance of a set of data, or define the edge probability, we should hand in a blank.

This seems to be somewhat contradictory to the idea that artificial intelligence is just a repackaging of ancient statistical techniques.

Indeed, in deep learning courses, the statistical foundation of machine learning experts may be stronger than undergraduate students in computer science. In general, information theory requires a deep understanding of data and probability, so I recommend that anyone who wants to be a data scientist or machine learning engineer should have an intuitive understanding of statistical concepts. But the question is: If machine learning is just a branch of statistics, how can people with no statistical background have an in-depth understanding of cutting-edge machine learning concepts?

It should also be acknowledged that many machine learning algorithms have higher requirements for statistical and probabilistic background knowledge than most neural network techniques, but these methods are often referred to as statistical machine learning or statistical learning, as if they want to be distinguished from conventional statistical categories. open. Moreover, in recent years, most of the hype innovations in machine learning come from the field of neural networks, so this does not matter.

Of course, machine learning is not independent. In the real world, anyone who wants to do machine learning may be studying many types of data problems, so they also need a deeper understanding of the statistics department. This is not to say that machine learning never uses or builds statistical concepts. This is not the same thing.

Machine learning = representation + evaluation + optimization

Objectively speaking, my classmates and I have a good foundation in algorithms, computational complexity, optimization strategies, calculus, linear algebra and even probability theory. What I want to say is that these are more relevant to the problem we are solving than advanced statistical knowledge.

Machine learning is a type of calculation algorithm that repeatedly "learns" the approximation of a certain type of function. Pedro Domingos, a professor of computational science at the University of Washington, has listed three components that make up machine learning algorithms: representation, evaluation, and optimization.

It is easier to interpret the input transfer involving a more efficient space from one space to another. Please consider this issue from the perspective of convolutional neural networks. The original pixels are useless in distinguishing between cats and dogs, so we convert them to more efficient expressions (such as logits in softmax output) so that they can be interpreted and evaluated.

The assessment is actually the loss function. How does your algorithm effectively transform data into another, more efficient space? How similar is the softmax output to the one-hot encoded tag (classification)? Can you correctly predict the next word of the expanded text sequence (text RNN)? How far is the deviation of the hidden distribution from the unit Gaussian distribution (VAE)? These questions tell you how well the functions are expressed; more importantly, they define what it needs to learn to do.

Optimization is the last piece of the puzzle. Once you have the assessment section, you can optimize the expression function to improve the evaluation criteria. In neural networks, this means using some variants of random gradient descent to update the weights and offsets of the network according to a given loss function. You have the best image classifier in the world (at least in Geoffrey Hinton in 2012).

When training an image classifier, it is irrelevant whether or not the learned expression function has a logical output, in addition to defining an appropriate loss function. Statistical terms like logistic regression do have some effect when we discuss model space, but they are not redefined in optimization problems and data understanding problems.

PS: The term artificial intelligence is quite stupid. The problem of artificial intelligence is just a problem that the computer is still not well solved. In the nineteenth century, mechanical computers were once considered to be intelligent. Now that the term is so closely tied to deep learning, we are beginning to say that general artificial intelligence (AGI) is something smarter than advanced pattern matching mechanisms. However, we do not have a consistent definition or understanding of general intelligence. The only thing AI does is to inspire fear of so-called "singularities" or killer robots like Terminator. I hope that we can stop using such an empty, sensational term to replace real technology.

Deep learning technology

The internal workings of almost all deep neural networks ignore the statistical properties of deep learning. The full connection point consists of weights and offsets, but what about convolutional layers? Rectification activation layer? Batch standardization? Residual layer? Dropout? Storage and attention mechanism?

These innovations are critical to the development of high-performance deep networks, but they are not exactly the same as traditional statistical techniques (perhaps because they are not statistical techniques at all). If you don't believe me, try telling the statistician that your model is over-fitting, and then asking them to cut half of the model's 100 million parameters will work.

We don't even discuss the interpretability of the model.

The return of more than 100 million variables - no problem?

Depth networks and traditional statistical models vary in size. Deep neural networks are huge. For example, the convolutional network VGG-16 has approximately 138 million parameters. How do you think your general academic tutor will respond to a student who wants to return to more than 100 million variables? This idea is ridiculous because training VGG-16 is not a multiple regression, but a machine learning problem.

New frontier

In the past few years, you may have seen countless papers, posts, and articles that promote machine learning to accomplish cool tasks, so I won't go into details. However, I want to remind you that deep learning is not only more important than previous technologies, it also helps us solve a whole new set of problems.

Prior to 2012, issues involving unstructured and semi-structured data were at best a challenge. The trainable CNN and LSTM have made a huge leap forward in this regard. Significant advances have been made in the fields of computer vision, natural language processing, and speech transcription, and there has been a significant improvement in techniques such as face recognition, autonomous driving, and AI dialogue.

Indeed, most machine learning algorithms end up fitting the model to data—from this perspective, this is a statistical process. The space shuttle is nothing more than a winged aircraft, but we have not seen anyone laughing at NASA's exploration of space in the 20th century, and no one thought it was a repackaging of the aircraft.

As with space exploration, the emergence of deep learning does not solve all the problems in the world. In many areas, especially in the field of "artificial intelligence," there are still many things that we need to do. In other words, it makes a significant contribution to our handling of complex unstructured data problems. Machine learning will continue to lead the world in technological advancement and innovation, not just the cracks on the wall with a new border.