Deep learning vs. probabilistic graph vs. logic
Abstract: This article reviews the three paradigms of artificial intelligence (AI) over the past 50 years: logics, probabilistic methods, and deep learning. In chronological order, the paper reviews the methods of logic and probability map, then makes some predictions about the future trend of AI and machine learning.
"Editor 's note" published last month in the blog post "deep Learning vs. machine learning vs. Pattern recognition", Dr. CMU, MIT postdoctoral and Vision.ai co-founder Tomasz Malisiewicz this time led us to review the evolution of the three paradigms (logics, probabilistic methods and deep learning) in the field of artificial intelligence over the past 50 years. Through this article, we can understand the current situation and the future of AI and deep learning more deeply.
The following is the text:
Today, let's review the three paradigms of artificial intelligence (AI) over the past 50 years: logics, probabilistic methods, and deep learning. Today, both the experience and the "data-driven" approach, and the concept of big data and deep learning, have been deeply rooted, but not in the early days. Many of the early AI methods are based on logic, and the process of transformation from a logic-based to a data-driven approach is influenced by the depth of probability theory, and then we talk about it.
This paper, in chronological order, reviews the logic and probability map methods, and then makes some predictions about the future trend of AI and machine learning.
Photo Source: Coursera's probability Map model class
1. Logic and algorithms (Common sense "thinking" machine)
Many early AI work is concerned with logic, automatic theorem proving and manipulating various symbols. John McCarthy's groundbreaking paper, written in 1959, was named "Common Sense programming".
If you turn to one of the most popular AI textbooks of the moment-ai: A Modern Approach (AIMA), we will directly note that the opening of books is about search, constraint fulfillment, first-order logic and planning. The third edition cover (see) is like a large chessboard (because chess is a hallmark of human wisdom), and it also prints pictures of Alan Turing (the father of computer theory) and Aristotle (one of the greatest classical philosophers, a symbol of wisdom).
The cover of Aima, it is the standard teaching material of CS professional undergraduate AI course
However, the logic-based AI obscures the perceptual problem, and I have long advocated that understanding the principle of perception is the key to unlocking the mysteries of intelligence. Perception is the kind of thing that is so easy for a person to grasp that the machine is hard to master. (Read more: "Computer vision is artificial intelligence", the author 2011 Years of blog) logic is pure, the traditional chess robot is purely algorithmic, but the real world is ugly, dirty, full of uncertainty.
I think most contemporary AI researchers believe that logic-based AI is dead. The world where everything can be perfectly observed, and there is no measurement error, is not the real world of robots and big data. We live in the age of machine learning, where digital technology defeats first-order logic. Standing in 2015, I really feel sorry for those fools who cling to the gradient drop.
Logic is good for lectures in class, and I suspect that once enough cognitive problems become "intrinsically resolved", we will see a revival of logic. There are many open cognitive problems in the future, and there are many scenarios in which communities don't have to worry about cognitive problems and begin to revisit these classic ideas. Maybe in 2020 years.
Read more:Stanford Philosophy Encyclopedia of Logic and Artificial intelligence
2. Probability, statistics and graph model ("Measuring" machine)
The probability method in artificial intelligence is used to solve the problem of uncertainty. The middle section of "AI: A Modern Method" introduces "uncertain knowledge and reasoning," and presents these methods vividly. If you pick up Aima for the first time, I suggest you start reading from this section. If you are a student who has just come into contact with AI, don't skimp on maths.
PDF file from Penn State University's course on probability and mathematical statistics
Most people think of the probability method when they are referring to just counting. It is easy for the layman to assume that the probabilistic method is the fancy-counting method. So we look briefly at these two equally comparable approaches to statistical thinking in the past.
Frequency theory relies on experience--these are data-driven and purely data-based inferences. The Bayesian method is more complex, and it combines data-driven likelihood and priori. These priors often come from first principles or "intuition", while Bayesian methods are adept at combining data with heuristic thinking to make smarter algorithms-the perfect combination of rationalism and an experiential worldview.
The most exciting, and later the controversy of frequency and Bayesian, is something called probability map model. This kind of technology comes from the field of computer science, although machine learning is now an important part of CS and statistics, and its powerful ability is really released when statistics and operations are combined.
The probability graph model is a combination of graph theory and probabilistic method, which were rage in machine learning researchers in the middle of the 2000. When I was in graduate school (2005-2011), the Variational method, the Gibbs sample, and the confidence propagation algorithm were deeply implanted in the brains of each CMU graduate student, and gave us an excellent psychological framework for thinking about machine learning problems. Most of the knowledge I know about graph models comes from Carlos Guestrin and Jonathan Huang. Carlos Guestrin is now the CEO of Graphlab (now renamed Dato), a company that produces large-scale products for image machine learning. Jonathan Huang is now a senior researcher at Google.
The video below is an overview of Graphlab, but it also perfectly illustrates "graphical thinking" and how modern data scientists can use it with ease. Carlos is a good lecturer, his speech is not confined to the company's products, more is to provide the next generation of machine learning system ideas.
Introduction to the calculation method of probabilistic graph model (video and ppt download)
Professor Dato Ceo,carlos Guestrin
If you think deep learning can solve all machine learning problems, really take a good look at the video above. If you are building a recommendation system, a health data analysis platform, designing a new trading algorithm, or developing a next-generation search engine, the graph model is the perfect starting point.
Extended reading:
Confidence propagation Algorithm Wikipedia
An introduction to the Variational method of graph model
Michael Jordan's Technology homepage (one of the giants of Michael Jordan's reasoning and graph model)
3. Deep Learning and machine learning (data-driven)
Machine learning is the process of learning from a sample, so the most advanced recognition techniques require a lot of training data, deep neural networks and patience. Deep learning highlights the network architecture of today's successful machine learning algorithms. These methods are based on a "deep" multilayer neural network that contains many hidden layers. Note: What I want to emphasize is that deep structure is no longer a novelty today (2015). Just look at this "deep" structure article for 1998 years.
Lenet-5,yann LeCun's groundbreaking paper "Method for document recognition based on gradient learning"
When you read the Lenet Model guide, you can see the following terms and conditions:
to run this example on the GPU, you first have to have a GPU that performs well. GPU memory must be at least 1GB. If the monitor is attached to the GPU, more memory may be required.
When the GPU and the monitor are connected, each GPU function call has a time limit of several seconds. This is necessary because the current GPU cannot continue to serve the display while the operation is in progress. Without this restriction, the monitor would freeze for too long and the computer would look like it was freezing. If you process this example with a medium-quality GPU, you will encounter a problem that exceeds the time limit. This time limit does not exist when the GPU is not connected to the monitor. You can reduce the batch size to resolve the timeout problem.
I was really curious about how Yann had been tossing his depth model out of something in the early 1998. It's no surprise that we all have to spend another 10 years digesting this stuff.
Update: Yann says (through Facebook's comments) that convnet work goes back 1989 years. "It has approximately 400K connections and spent about 3 weeks on a SUN4 machine training USPS data sets (8,000 training samples). "--lecun
Deep Network, Yann1989 's results in Bell Labs
Note: About the same period (around 1998) California has two crazy guys in the garage trying to cache the entire Internet to their computers (they started a company with G). I don't know how they do it, but I think there are times when I need to get ahead and do things that are not large. The world will eventually catch up.
Extended reading:
Y.lecun, L.bottou, Y.bengio, and p.haffner.gradient-based Learning applied to document recognition. Proceedings of the IEEE, November 1998.
Y.lecun, B.boser, J.s.denker, D.henderson, R.e.howard, W.hubbard and l.d.jackel:backpropagation Applied to Handwritten Zi P Code Recognition, Neural computation, 1 (4): 541-551, Winter 1989
Deep learning Code:modern LeNet implementation in Theano and docs.
Conclusion
I didn't see the traditional first-order logic coming back quickly. Although there is a lot of hype behind deep learning, distributed systems and "graphic thinking" are more likely to have a more profound impact on data science than CNN, which is optimized for gravity. Deep learning has no reason not to be combined with the Graphlab-style architecture, and a major breakthrough in machine learning over the next few decades is likely to come from a combination of these two parts.
Original link: Deep learning vs Probabilistic graphical Models vs Logic (translation/zhyhooo reviewer/Wang Wei Zebian/Zhou Jianding)
This article for CSDN compilation, without permission not reproduced, if necessary reprint please contact market#csdn.net (#换成 @)
Deep learning vs. probabilistic graph vs. logic