Neural Network Structure Summary

Source: Internet
Author: User
Tags svm

 

 

Perceptron is a function from the input space (feature space) to the output space: f (x) = Sign (w * x + B ), W and B are the weights and bias parameters of the sensor. The linear equation W * x + B = 0 represents a hyperplane of the feature space, that is, the separation of the hyperplane. First, the data set of the sensor machine is linearly partitioned. The so-called linear split means that such a plane can completely divide the data into two sides correctly. The objective of perception machine learning is to obtain W and B. We need to determine a (empirical) loss function and minimize the loss function. We are most likely to think of this loss function as the total number of mis-categories, but we should also note that this function cannot be a bootable continuous function of W and B, therefore, we choose the distance from the point to the superplane of the point misclassification as the loss function: L =-Σ (yi * (w * Xi + B )). The gradient descent method is generally used for training.

Anti-convolutional Network (DN)Also known as the reverse graph Network (IGN), is the reversal of the convolutional neural network. For example, enter the word "cat" to train the network by comparing the images generated by the network with the real images of the cat, so that the network can produce images more like the cat. DN can be combined with ffnn like conventional CNN, so you need to give it a new "abbreviation. The term "deep anti-convolutional network" is probably feasible, but you may argue that two different names should be used for connecting ffnn to the DN frontend or backend respectively.

In most applications, the input network is not a text-style category information but a binary vector. For example, <0, 1> indicates a cat, <1, 0> indicates a dog, and <1, 1> indicates a cat or a dog. In the DN, the common CNN sampling layer is replaced by a similar reverse operation, mainly including the interpolation method and the push method with the bias assumption (if the sampling layer uses the maximum value sampling, you can create new data that is smaller than the maximum value when performing reverse operations .)

Zeiler, Matthew D., et al. "deconvolutional networks." computer vision and pattern recognition (cvpr), 2010 IEEE Conference on. IEEE, 2010.

 

Deep convolution reverse graph Network (dcign)The name is misleading. It is actually a class of Variational automatic encoder (Vae), but uses CNN as the encoder and DN as the decoder respectively. Dcign tries to model "Features" in probability during coding, so that even if you only learn images that only exist on the cat or dog side, it can also learn to produce images that coexist with cats and dogs. If a photo contains both a cat and a nasty dog from a neighbor's house, you can input the photo into the network so that the network can automatically rip the dog out without any additional operations. The demo program shows that the network can also learn to make complex changes to the image, such as changing the light source and rotating 3D objects. This network usually uses reverse propagation for training.

Kulkarni, Tejas D., et al. "deep convolutional inverse graphics network." advances in neural information processing systems. 2015.

 

Generate a protection network (GaN)Source is another type of network, which is operated collaboratively by two paired networks. Gan is composed of any two networks (but usually a combination of ffnn and CNN). One is used for generation and the other is used for identification. The ability to correctly distinguish data sources is part of the ability to generate network errors. In this way, a form of competition is formed: the discriminative machine is more and more good at distinguishing between real data and generating data, while the generator keeps learning, making it more difficult to distinguish the discriminative machine. Sometimes, this mechanism works well, because even complicated noise-like modes are predictable, but it is more difficult to differentiate the generated data similar to the input data features. Gan is hard to train-You not only need to train two networks (they may all have their own problems), but also have a good balance between their dynamic conditions. If the prediction or generation of any party is stronger than the other party, this Gan will not converge, but will be directly dispersed.

Goodfellow, Ian, et al. "generative adversarial nets." advances in neural information processing systems. 2014.

 

Recurrent Neural Network (rnn)Is an ffnn with a "time knot. Rnn is not stateless [1]. It has both layer-to-layer connections and time connections. The information entered into the neuron is not only transmitted from the previous layer, but also from the state of the neuron itself during the previous transmission. This means that the order of input and training networks is critical: Enter "milk" first, then "cookies", and enter "cookies" first and then "milk" to get different results. A major problem with rnn is that different incentive functions will respectively cause gradient dispersion or explosion, which will cause rapid loss of information as time changes, it is like losing in the very deep ffnn as the depth increases. At first glance, it seems that it is not a big problem, because the information is only the weight rather than the State of neurons.

However, the weights at different times actually store information from the past. If the weights change to 0 or 100, the previous State does not matter. In general, rnn can be used in many fields. Although most of the data does not have time series, such as audio and video, they may be represented as sequences. The image and text sequence can be input in the form of one pixel or character each time. In this way, the time-related weights are not from the status that appeared in the previous x seconds, but represent the previous status of the sequence. Generally, cyclic networks are good at predicting and completing information, for example, they can be used for automatic completion.

[1] "stateless" means "output is only determined by the current input ". Rnn is "stateful" because it partially "remembers" the previously entered status )". -- Annotation.

Elman, Jeffrey L. "Finding structure in time." cognitive science 14.2 (1990): 179-211.

 

Lstm)The network tries to combat the gradient dispersion/explosion problem by introducing a "Gate" and a clearly defined memory unit. Compared with biology, it is more inspired by circuit science. Each neuron has a memory unit and three gates: input, output, and forgetting. The function of the portal is to prevent and allow the flow of information to protect information. The input gate determines how much information the previous layer can be stored in the current unit. The output gate at the other end determines how much information the latter layer can obtain in the current unit; forgetting is a bit strange at first glance, but "Forgetting" is sometimes right-for example, learning a book and then starting a new chapter, at this time, the network may have to forget some of the words learned in the previous chapter. Lstm can learn complex sequences and write and create new music like Shakespeare. Since each department has a weight on the previous neuron, more resources are required for network operation.

Hochreiter, Sepp, and J ü rgen into dhuber. "Long short-term memory." neural computation 9.8 (1997): 1735-1780.

 

Gate loop unit (GRU)It comes from the slight changes of lstm. Gru reduces the number of doors and changes the connection mode: The update door replaces the input, output, and forgotten doors. The update gate determines how much information is retained from the previous status and the previous network respectively, and the reset gate is similar to the lstm forgot gate, but the location is slightly changed. Gru will directly pass out all States, rather than passing through an additional output gate. Generally, Gru is similar to lstm, and the biggest difference is that GRU is faster and easier to run (but less expressive ). In practice, the running performance may be offset by the expressive ability: When a larger network is run to obtain a higher expressive ability, the advantage of the running performance will be suppressed. When no additional expression capability is required, Gru performance exceeds lstm.

Chung, junyoung, et al. "empirical evaluation of gated Recurrent Neural Networks on sequence modeling." arXiv preprint arXiv: 1412.3555 (2014 ).

 

Neural Network (NTM)It can be understood as the abstract form of lstm. It tries to remove the neural network from the black box, so that we can partially understand what happened inside the neural network. Unlike NTM, which directly encodes memory units into neurons, NTM memory is separated. NTM wants to combine the efficiency and persistence of conventional digital storage, the efficiency and expression ability of neural networks, and its vision is to establish a content addressable memory group, and the neural networks that can read and write this memory group. The Turing in the neural Turing Machine refers to the complete Turing: being able to read, write, and change the status based on the read content. That is to say, it can express everything that a universal Turing machine can express.

Graves, Alex, Greg Wayne, and Ivo Danihelka. "Neural Turing machines." arXiv preprint arXiv: 1410.5401 (2014 ).

 

Bidirectional Cyclic Network,Bidirectional long-short-time Memory NetworkAndBidirectional gate loop Unit(Birnn, bilstm, and bigru) look exactly the same as their unidirectional form, so they are not drawn. The difference is that these networks are not only connected to the past status, but also to the future status. For example, let one-way lstm train by inputting letters in sequence to predict the word "fish". At this time, the cycle connection on the timeline remembers the value of the previous state. The two-way lstm will continue to get the next letter of the sequence when the value is reversed, that is, get the future information. This taught the network to fill gaps, not to predict information-they are not to expand the edge of the image, but to fill the hollow of the image.

Schuster, Mike, and kuldip K. paliwal. "bidirectional Recurrent Neural Networks." IEEE Transactions on Signal Processing 45.11 (1997): 2673-2681.

 

Deep residual network (DRN)It is a very deep ffnn with an extra layer connection (usually separated by two to five layers) on the basis of layer-by-layer connections. Unlike conventional networks, DRN strives to understand the ing between input and output after multi-layer network transmission. It adds a bit of equality to the solution, that is, the input in the shallow layer is directly provided to deeper units. Experiments show that DRN can efficiently learn networks of up to 150 layers, and its performance far exceeds the conventional two-to five-layer simple network. However, it has been proved that DRN does not have a specific rnn, so it is often compared to lstm without a gate unit.

He, Kaiming, et al. "deep residual learning for image recognition." arXiv preprint arXiv: 1512.03385 (2015 ).

 

Echo status Network (ESN)Is another type of Cyclic Network. The difference from the general network is that the connection between ESN neurons is random (that is, there is no neat layer-layer form), and the training process is naturally different. The Method of Data forward-to-input and back-to-error propagation cannot be used. We need to input data forward, wait for a while before updating the unit, wait for a while, and finally observe the output. Compared with General neural networks, the role of the input and output layers in ESN has changed a lot-the input layer fills in information to the network, and the output layer observes the status in which the activation mode expands over time. During training, only the connections between the output layer and some hidden layer units are changed.

Jaeger, Herbert, and Harald Haas. "Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication." Science 304.5667 (2004): 78-80.

 

Extreme learning machine (ELM)It is basically a random connection ffnn. It looks like LSM and ESN, but elm is neither cyclical nor impulsive. Elm does not need reverse propagation, but is initialized to a random weight, and then trained in one step by least square fitting (with the smallest error in all functions. In this way, we can obtain a neural network with weak expression ability, but far faster than back-propagation.

Cambria, Erik, et al. "extreme learning machines [Trends & controversies]." IEEE intelligent systems 28.6 (2013): 30-59.

 

Liquid State Machine (LSM)Compared with ESN, LSM is a type of impulsive neural network: The sigmoid activation function is replaced by the threshold function, and each neuron is a cumulative memory unit. Therefore, when updating a neuron, its value is not the sum of connected neurons, but the accumulation of itself. Once the threshold is reached, the energy is released to other neurons. This forms a pulse-type network. The status changes only when the threshold value is exceeded.

Maass, Wolfgang, Thomas natschl? GER, and Henry Markram. "real-time computing without stable states: A New Framework for neural computation based on perturbations." neural computation 14.11 (2002): 2531-2560.

 

Support Vector Machine (SVM)Find the optimal solution for classification problems. At first, SVM can only process linearly partitioned data, such as determining which one is Garfield and which one is Snoopy, without any other situations. We can understand SVM training in this way: draw all the data (such as Garfield and snoop) on the (2D) graph and draw a line between the two types of data, this line separates the data, that is, all Snoopy is here and all Garfield is on the other side. You can find the optimal solution by maximizing the interval between the data points on both sides and the split line. When classifying new data, you just need to draw the data points on the graph and look at the online side. You can use the core method to classify n-dimensional data. In this case, you need to draw points in a 3D graph, this allows SVM to differentiate Snoopy, Garfield, and-for example, Simon's cat-or higher dimensions and more cartoon image categories. Sometimes, SVM is not considered a neural network.

Cortes, corinna, and Vladimir Vapnik. "support-vector networks." machine learning 20.3 (1995): 273-297.

 

Kohonen network (kN, also called self-organizing (feature) graph, Som, SOFM). KN uses competitive learning to classify data without supervision. After the data is input, the network will evaluate which neurons have the highest matching degree with the input, then fine-tune them to continue improving the matching degree, and slowly drive other neurons adjacent to them to change. The degree to which neighboring neurons are changed is determined by the distance between them and the unit with the highest matching degree. Kohonen is sometimes not considered a neural network.

Kohonen, teuvo. "Self-Organized formation of topologically correct feature maps." biological cybernetics 43.1 (1982): 59-69.

 

Reprinted from: https://www.leiphone.com/news/201710/0hKyVawQLqAuKIm6.html

Neural Network Structure Summary

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.