What are some of the things that are known as common sense, or are they just attachments to the times? From the whole history of the river to see, perhaps some inexplicable or cruel to the extreme strange just "
———— Silently nameless
These two years because of some explosive AI applications, led to the public's vision to the direction of the development, since Turing proposed "Turing test", the AI has erupted two times, the corresponding also fell into the two trough, it is now seen to enter the third craze, but then ushered in the " singularity ", Or the third trough, it is unknown.
Strong AI (also known as Universal AI), or machine self-awareness, is naturally a final goal, but when we move towards this goal, we always get some small rewards, these small rewards are now weak AI, some very practical algorithm theory and application, Although the landing still some difficulty, but the overall maturing, commercial applications have begun to bloom, a short period of time will be triggered by the whole industry ai+, all business will be the AI refresh again, of course, some people will say a bunch of career to die, and many people to be laid off. But it will also create more jobs and careers. And since human beings, this thing we have done a lot of times (the first three times the Industrial Revolution), no panic, the history of the wheel rolling, will always leave a part of the people, but also on the other part of the people, The Times is riding, behind, always keep learning, keep up, and keep hungry. If the times abandon you and don't say goodbye, then you have to hurry up and beat it up.
--------------(this looks like a line)--------------------
Gossip less, for you present all kinds of drawbacks of the algorithm introduction (easy to understand the description, specific details do not table), there may be some place to say the wrong, I hope to get a little friendship remind, I will immediately amend.
CNN: convolutional Neural network
I have said a little bit about the fundamentals of convolutional neural networks in a pit log on TensorFlow (Google Open source AI framework).
Of course, the specific principle I still recommend to see this article http://www.36dsj.com/archives/24006
is the convolutional layer +n layer Neural network BP layer (also called the full link layer) on the principle of BP I have written one before, turn back to see it.
What exactly is the convolutional layer? Much like a filter layer, we know that the actual image is a matrix composed of each pixel point, and then each pixel can be represented by the value of the RGB 3 primary color of the range is (0-255) if you do a grayscale processing, then each pixel is represented by a 0-255 grayscale value. The image is equivalent to a 2-dimensional digital matrix. Of course, if the color wants to retain, do not do grayscale processing, RGB is equivalent to 3 different matrices, the length and width is the same. There are different values.
So let's go back to the convolution layer and take a convolution core and roll it over the matrix (matrix multiplication) to derive a new matrix. Convolution core is also a small 2-dimensional matrix, different numerical convolution kernel, can be extracted from this picture of the information, this is the image of the characteristics, such as a special extraction of the vertical line of the convolution kernel in the original picture roll over, you can get a full vertical line of the feature map. If we were to make a bamboo recognizer, we would have to use this feature. But if you want to do a basketball recognizer, it is not used, or not, it is determined by the BP layer. But the work of extraction is still to be done, but how to decide what the number of convolution core should be composed of? Random! Because this algorithm is more general, can be made to identify a variety of things, so the convolution kernel should be arbitrary features can be extracted, that as long as the generation of 1000,1w, or 100 million convolution cores, each in this image roll over, you can extract 100 million features. If the last BP layer uses only one of these features, it can identify bamboo or basketball. That is not very wasteful, so the number of convolution cores depends on the complexity of recognition. Otherwise the amount of computation is terrible.
Of course, CNN also has a lot of details, such as pool layer, normalization, dropout.
The pooling layer also has several different methods, if the mean value is mean pooling, the maximum value is max pooling
Pooling is a good way to understand, in order to reduce the amount of computation.
Normalization (normalization, also called normalization) is not too large, too small, or too sparse to allow data to be transmitted over the network.
Some of the earlier normalization method to see this "normalization method summary" http://blog.csdn.net/junmuzi/article/details/48917361
Then Google had a paper that said another way Batch normalization http://blog.csdn.net/zhikangfu/article/details/53391840
I heard it worked well.
dropout is to train the network by randomly invalidating some features, so that the generalization ability is stronger. I practiced it myself once, but I felt the training process became longer and more volatile. Use with caution.
There are many kinds of gradient descent methods for all-link layers, such as: http://blog.csdn.net/xierhacker/article/details/53174558
Here are some tensorflow built-in gradient drop optimizer, GradientDescentOptimizer
AdagradOptimizer
MomentumOptimizer,
Adamoptimizer
CNN's algorithm has a lot of parameters to be adjusted, such as the network layer, the initial learning rate, the probability of dropout, and so on, such collectively referred to as hyper-parameter
There are articles that now many of CNN's work is very boring tuning parameters, because the training cycle is very long, super-parameter adjustment and specific things to identify the correlation is very strong, for example, training 10 categories, and training 1000 categories of network depth is different, I wrote the pit log when I made this mistake, Take a very simple network to train many kinds, the result has not been convergent. Loss is very big.
Of course, there are some practical experience to share the parameters of the adjustment, we can find it by ourselves. I won't dwell on it here.
Although we have just said the image, but the text and audio can also be translated into this input, similar to the 1*n matrix.
----------------------(this looks no different from the first line)----------------------
RL: Intensive Learning (reinforcement learning)
The details of RL can be seen in Deepmind's Open class.
Here to share a B station with Chinese subtitles video, 100 minutes per lesson, a total of 10 lessons, 16 hours can also be read, but because the relatively obscure, I suggest is not a one-time reading, or a face to go in, a face confused forced out. If you do not understand, go and ask, understand and continue to see. https://www.bilibili.com/video/av9831889/
The core principles of reinforcement learning are:
An agent (an intelligent agent) interacts with the environment (state) and then, after repeated training based on the feedback (reward), the agent can choose the optimal decision (action) when it encounters any state. This optimal decision will bring maximum reward in the future.
the problem that RL solves is the problem of continuous decision-making, that is, there is a series of decision-making before we get the real problem of reward. for example, a baby 3 years old, for the first call a sauce, a sauce will not take a cup drink. We know that the optimal strategy is
: Close to the cup, pick it up and pour it into your mouth.
But she did not know at the beginning when she saw a cup in the distance, whether it should be near or far away. And even if you do these 2 things over and over again, there is no immediate reward for her (drink). So the reward is lagging, but we want to be able to rate the action, according to the score of the high and low let the agent decide what to do, for example, close to the Cup 10 points, away from the Cup-10 points. If each step has such a scalar as a measure, then she can know what the best strategy to get the reward is, of course, some of the actions in different scenes will lead to different effects, so here's the score for State-action pair (state-action pair) scoring.
So the function of RL is to undergo repeated training to provide a score for each pair of action-state. This is the value based (one of the RL algorithm implementations).
If you first assume that the final bonus score is 100 points. So how many points should be allocated in the previous step? And how much before the previous step? The Behrman equation is used here.
Specific details can be seen in this article http://blog.csdn.net/VictoriaW/article/details/78839929
S:state
A:action
Q is the score I said above. Then qπ refers to the fractional function under the optimal strategy.
P refers to the transition probability of the next state s after an action A is determined under State S. (for example, a sauce has seen the cup, then she used the action A1 (pick up) but not necessarily succeed, there is a transfer probability p).
R means immediate reward.
Refers to the discount rate, is a 0-1 of the number, refers to the future Q value of the current Q of how much, if 1 is the 100% effect.
Here we can see that the current Q value is composed of two parts, the current R value + the next state of the Q value.
Assuming the discount rate here is 0.5, the final bonus score is 100 points.
In turn, when a sauce is drunk to whatsoever, the Q value equals R value, because there is no next state. In a move forward (pick up the cup) because there are only 1 actions, and immediately reward r=0, so the Q value equals 0 + 0.5*100=50.
Then go to the previous step, State 1 (see the Cup) when choosing (near), so the Q value is 0+0.5* (0+ 0.5*100) =25
This is where we use the known optimal strategy and then reverse the Q value with the Behrman equation, which makes it easier to understand the meaning of the Q value.
Although we do not know the optimal strategy, but if we have a table of all the actions of all States record Q value, as long as the process through the above deduction to know all the values of the large table, and finally can be known through this large table to know the optimal strategy.
This is the logic of the q-learning algorithm.
Of course q-learning is not practical, because if the State and action are very much, the data volume of this table will explode.
So the subsequent development of a lot of algorithms, I recommend reading DQN related algorithms.
RL had it early and long ago.
the historical development of intensive learning
- 1956 Bellman proposed a dynamic programming method.
- 1977 Werbos proposed only adaptive dynamic programming algorithm.
- 1988 Sutton proposed the time difference algorithm.
- 1992 Watkins proposed q-learning algorithm.
- 1994 Rummery proposed Saras algorithm.
- 1996 Bersekas proposed a neural dynamic programming method to solve the optimal control in stochastic process.
- In 2006, Kocsis proposed the algorithm of the upper confidence tree.
- 2009 Kewis proposed that the feedback control only adapts to the dynamic programming algorithm.
- 2014 Silver presented a deterministic strategy gradient (policy gradents) algorithm.
- 2015 Google-deepmind proposed deep-q-network algorithm.
Because Alphago is based on RL, the main use of the Monte Carlo Tree Search algorithm (MCTS) and RL These two years and by the great God has advanced a lot of optimization.
Here is an article http://geek.csdn.net/news/detail/201928
I'll take it. Here are a few categories to implement the RL algorithm:
- model-free: Don't try to understand the environment, what the environment gives, step-by-step wait for real-world feedback, and then take the next step based on feedback.
-
model-based: First understand what the real world is, and build a model to simulate real-world feedback, Think about all the things that are going to happen next, then choose the best of those scenarios and follow the next strategy. It has more than model-free a virtual environment, and imagination.
-
policy based: through the environment of sensory analysis, directly output the probability of the next action to be taken, and then take action based on probability.
-
value based: The output is the value of all actions, according to the highest value to select the action, such methods can not select a continuous action.
-
monte-carlo update: After the game starts, wait for the game to end, then summarize all the turning points in the round and update the code of conduct.
-
temporal-difference update: Every step in the game is updated without waiting for the end of the game, So that you can play while learning.
-
on-policy: I must be present, and I will be playing while learning.
- off-policy: You can choose to play, you can also choose to watch others play, by watching others play to learn other people's code of conduct.
The current practical algorithm for RL is:
DQN,DDPG, A3c,dppo, etc.
Here is a deepmind about the DQN mixed story http://tech.ifeng.com/a/20171010/44710270_0.shtml
Take a picture, everybody, look.
The horizontal axis is the number of training times, the longitudinal axes are more than the human level percentage, 100% is equal to the average human playing game level, in the 57 Atari games average performance.
So much to talk about today, and then I will continue to add a summary of the algorithm, thank you for reading!
Gans: Generate confrontation network (to be continued)
RNN: Cyclic neural network (to be continued)
LSTM: Long-term memory network (to be continued)
Migration learning (to be continued)
Some interesting examples of open source applications over the past two years, and the algorithms used
CNN: human face recognition style migration in image recognition
RL: AlphaGO Game Robot Control Ali product Recommendation System
Gans: style migration sketch generate entity diagram Cat face turn dog face remove image occlusion Age transfer Super resolution
RNN LSTM: translate models, generate ancient poems, generate couplets, PSD generate HTML code
A survey of artificial intelligence algorithms