Learning DL shelved for a long time, finally decided to start the ~ ~

Deep Learning (Ian Goodfellow&&yoshua Bengio&&aaron Courville)-Origin of the book: http://www.deeplearningbook.com/

Speaking from artificial intelligence, we talk about artificial intelligence to solve the problem. In the early days, AI tended to look at problems that were difficult for humans, but relatively straightforward and straightforward for machines, and the common denominator of these problems was that they could easily be described by formal mathematical rules. The real challenge for AI is to solve problems that are easy for people, but difficult to formalize. These early solutions do not require much knowledge of the world, but the key challenge in AI is how to incorporate these irregular knowledge into the computer. Often people's daily life needs a lot of knowledge is subjective, with heuristic, difficult to be formalized.

So people began to try to hard code the knowledge in formal language (Hard-code). The difficulties faced by these systems, which rely on hard-coded knowledge, suggest that AI systems need to be able to automatically extract patterns from raw data to acquire their own knowledge. And this ability is called machine learning (machines learning). The performance of a simple machine learning algorithm relies heavily on the given data representation. This reliance on data representation is a common phenomenon in computer science and even in everyday life. It is easy to perform arithmetic operations on Arabic numerals, but it is time-consuming to use Roman numerals for arithmetic operations. For most AI tasks, you can get good performance even if you feed a simple machine learning algorithm by designing a well-enough set of features. However, for many tasks, how to extract the features itself is a challenge.

Indicates learning (representation learning) is such a solution. It enables the use of machine learning methods not only to discover mappings from representations to outputs, but also to discover representations themselves. The advantage of learning, on the one hand, is that the performance shown by the presentation is better than the performance achieved by hand-designed representations, on the other hand, the new task can be quickly adapted with minimal human intervention. Autoencoder is an example of this. He is a synthesis of coding functions and decoding functions, designed to get a new representation so that the representation can retain as much information as possible after the input-encoding-new representation-decoding. When we design features or algorithms designed for feature learning, we want to differentiate the difference factors to explain the observed data. The difference factor here refers to the amount of information that can be distinguished from different sources, cannot be connected by multiplication, and cannot be observed directly. These factors can be seen as concepts or abstractions used to interpret the rich diversity of data. Many of the difference factors affect every piece of data we can observe, and can only be identified by a complex, near-human understanding, such as a person's spoken accent. Solving this problem is almost as difficult as expressing learning.

Deep Learning (learning) attempts to solve the core problem of presentation learning by learning from a simpler representation. This is the case with forward depth networks (Feedforward deep Network) or multilayer perceptron (multilayer Perceptron). Multilayer Perceptron can be simply understood as a mapping function from input to output, which consists of several simple functions, with different functions, producing different new representations. Deep learning on the one hand provides the idea of the correct representation of learning data, on the other hand, the depth itself makes the computer learn a multi-step computer program. The representation obtained at each layer can be considered as the storage state of the computer resulting from the execution of a series of parallel instructions. The deeper the network executes the more serial instructions. The serial instruction has a strong ability because subsequent instructions can refer to the results of the preceding instruction. From this perspective, not all the information that is activated in each layer is used to encode the difference factor to interpret the input, and some to store the state information to execute the entire program.

The depth of the two metric Learning models: (1) can be expressed by a calculated graph, that is, the depth of the calculated graph. The number of sequential instructions required to perform the entire structure. (2) for probabilistic models, it refers to the depth of graphs that describe the interrelationships between concepts.

AI,ML,RL, and the relationship between the DL three: Ai>ml>rl>dl. (1) Machine learning are the only viable approach to building AI systems, can operate in complicated, real-world environ ments. (2) Deep learning are a particular kind of machine learning, achieves great power and flexibility by learning to Represe NT the world as a nested hierarchy of concepts, with all concept defined in relation to simpler concepts, ad more Abstrac T representations computed in terms of less abstract ones.

(a) for readers: University student and software engineers

(ii) The historical trend of deep learning

1. Many names of deep learning and the neural networks in which fate changes

This part mainly introduces the entanglement of deep learning and neural network. The history of deep learning can be traced back to 1940s, and it is accurate to say that the history of neural networks can be traced back to 1940s. 1940s~1960s, known as cybernetics,1980s~1990s (Connectionism), was not known as deep learning until 2006. In fact there is also a name for deep learning that is artificial neural networks (Artificial neural network). The two origins are: Deep learning is inspired by the biological brain, there are two main points: (1) The brain provides examples, conceptually, the direct way to build intelligent behavior is to reverse engineer the computer behind the brain to reproduce its function; (2) Understanding the mechanism behind the brain and human intelligence is an interesting question, It is also useful to apply these basic questions to the machine learning model.

Today's deep learning is far beyond the neuroscience perspective and can be seen as a more general principle for learning multi-layered combinations (Principle of learning multiple levels of composition).

The pioneers of today's deep learning are simple linear models. Typical representations of these models, such as Mcculloch-pitts neuron and adaptive Linear Element (ADALINE), have had a huge impact on the vision of modern machine learning. The algorithm for learning these weights is one example. The special column of the Learning Adaline training algorithm is the random gradient descent (Stochastic Gradient descent), the improved version of this algorithm is still the main training algorithm of the current deep learning model. There are many limitations to the linear model, and the most popular is the inability to learn the XOR function.

Neuroscience can only be seen as an important source of inspiration for deep-learning scholars, but not an absolute guideline. Neuroscience gradually loses its important role in deep learning, mainly because our knowledge of the brain is limited enough to be an effective guideline. (1) Neuroscience gives us reason to hope that deep learning can solve multiple tasks at the same time. (2) We can draw on some rough principles from neuroscience, as many computational units interact with each other to make them smarter, which is inspired by the brain. An important source of inspiration does not serve as a rigorous guide. Even though neuroscience has successfully inspired the architecture of multiple neural networks, the process of learning about biology does not provide more information for our algorithms for training these architectures.

The media emphasizes the similarity between deep learning and the brain, but we can't see deep learning as an attempt to mimic the brain. Today's deep learning is inspired by many fields, especially linear algebra, probability theory, information theory, and numerical optimization in the application of mathematical foundations. Some deep-learning researchers like to cite neuroscience as a source of inspiration, and some researchers don't care about neuroscience at all. It is important to note that there is indeed such a branch of deep learning that focuses on how to understand the working mechanism of the brain from an algorithmic level, called Computational Neuroscience (computational neuroscience).

Some of the principles that have emerged in the history of deep learning are still in use today. (1) A large number of simple computing units connected together in the form of a network can achieve intelligent behavior (the core idea of the connection doctrine). (2) The idea of distributed representations (distributed learning): each input to a system should is represented by many features, and each feature should b E involved in the representation of many possible input. (3) The BP algorithm (back-propagation) is used to train the neural network. (4) The method of using neural networks to model sequences, such as LSTM, was proposed in the 90 's. (5) The Deepblief network proposed by Hinton in 2006 uses greedy Layer-wise pretraining method to achieve efficient training. This stage of deep neural network has been achieved can be compared with other machine learning algorithms based on manual design, deep learning is more focused on the depth of the theoretical importance. Today's deep learning initially focuses on unsupervised learning and the generalization of deep learning on small datasets, and later focuses more on supervised learning and deep learning for the use of large-scale annotated data.

2. With the increase in available training data, deep learning benefits are becoming more pronounced

3. With the improvement of the hardware and software architecture, the size of the deep learning model is increasing

The main achievement of today's neural networks is that there are enough computational resources to run large models. Faster Computers + Larger Memory +the availability of Larger, together led to the datasets of the model growth. When the implicit unit is introduced, the size of the neural network doubles every 2.4 years. There is a picture of the truth, see the original. This trend will continue for decades. It is expected that by 2050, neurons could reach levels comparable to those of the human brain.

4. Deep learning shows better accuracy in more complex applications

Deep Learning (1)-introduction Learning Summary