Learn Python for Machine learning algorithms

Source: Internet
Author: User
Tags machine learning convolutional neural network learning parameters migration learning neural network

4-Dropout

Deep neural networks with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, so combining predictions from many different large neural networks during testing is difficult to deal with overfitting. Dropout is a technology that solves this problem.

The key idea is to randomly remove cells and their connections from the neural network during training, which prevents over-compatibility between cells. During training, samples are taken from different "sparse" networks of index numbers. At the time of testing, by simply using a untwined network with a smaller weight, it is easy to approximate all of these sparse networks to achieve the predicted effect. This significantly reduces overfitting and performs better than other regularization methods. Dropout has been proven to improve the performance of neural networks in supervised learning tasks in the areas of computer vision, speech recognition, document classification and computational biology, and to obtain the most advanced results on many benchmark datasets.

5-maximum pooling

The maximum pooling is based on the discretization process of the sample. The goal is to downsample the input representation (image, hidden layer output matrix, etc.) by reducing its dimensions and allowing the merging of features contained in the sub-region.

This approach helps to somehow solve the overfitting by providing an abstract form of characterization. Similarly, it also reduces the amount of computation by reducing the number of learning parameters and providing a transformation invariance of basic internal characterization. Maximum pooling is accomplished by applying a maximum filter to the initial representation sub-regions that typically do not overlap.

6- batch normalization

Of course, neural networks, including deep networks, require careful adjustment of weight initialization and learning parameters. Batch normalization helps make the Chinese process a bit simpler.

Weight problem:

● Regardless of the initialization of the weights, random or empirical choices, they all differ greatly from the learning weights. Considering a small batch of data sets, there are many outliers in the initial period when the feature is activated.

● Deep neural networks are inherently fragile, ie small disturbances in the initial layer can cause large changes in the back layer.

During backpropagation, these phenomena can cause gradient shifts, which means that the gradient must compensate for outliers before learning the weights to produce the desired output. This will also result in additional time to converge.

Batch normalization regularizes these gradients from discrete to normal values and flows toward common targets (by normalizing them) within a small batch.

Learning rate problem: In general, the learning rate is kept small, so that only a small part of the gradient is used to correct the weight, because the gradient of abnormal activation should not affect the already learned weight. With batch normalization, the likelihood of these outliers being activated is reduced, so higher learning rates can be used to speed up the learning process.

7-Long-term and short-term memory:

The LSTM network has three aspects that distinguish it from conventional neurons in a recurrent neural network:

1. It can control when the input enters the neuron.

2. It can control when to remember what was calculated in the previous time step.

3. It can control when the output is passed to the next timestamp.

The advantage of LSTM is that it determines all of this based on the current input itself, as shown in the following figure:

The input signal x(t) at the current time stamp determines all of the above 3 points. The input gate determines the point 1. The forgotten gate makes a decision at point 2, and the output gate makes a decision at point 3. The input gate can make these three decisions individually. This is inspired by how our brain works and can handle sudden context switching.

8-Skip-gram:

The goal of the word embedding model is to learn a high-dimensional dense representation for each vocabulary, where the similarity between the embedded vectors shows the semantic or syntactic similarity between the corresponding words. Skip-gram is a model for learning word embedding algorithms.

The main idea behind the skip-gram model (and many other word embedding models) is as follows: If two words have similar contexts, they are similar.

In other words, suppose you have a sentence, such as "cat is a mammal." If you use the term "dog" instead of "cat", the sentence is still a meaningful sentence. Thus in this example, "dog" and "cat" can share the same background (ie "is a mammal").

Based on the above assumptions, you can consider a context window (a window containing k consecutive terms). Then you should skip one of the words and try to learn all the terms except one that is skipped and predict the neural network of the skipped term. So if two words repeatedly share similar contexts in a large corpus, the embedded vectors of those terms will have similar vectors.

9-Continuous Bag of Words:

In natural language processing problems, we want to learn to represent each word in a document as a number vector so that words that appear in similar contexts have vectors that are close to each other. In the continuous word bag model, the goal is to be able to use context around a particular word and predict a particular word.

We do this by extracting a large number of sentences in a large corpus, and we use its context words each time we see a word. Then we enter the context word into a neural network and predict the words in the center of the context.

When we have thousands of such context words and center words, we have an instance of a neural network data set. We train neural networks to represent the embedding of specific words in the output of the encoded hidden layer. As it happens, when we train on a large number of sentences, words in similar contexts will get similar vectors.

10-migration learning:

Consider how the image passes through the convolutional neural network. Assuming you have an image, you apply convolution and you get the pixel combination as the output. If the edge is encountered, the convolution is applied again, so the output is now a combination of edges or lines. Then apply the convolution again, the output at this time will be a combination of lines, and so on. You can think of it as looking for a specific pattern for each layer. The last layer of neural networks tends to become very professional. If you are using ImageNet, then the last layer of your network will look for children or dogs or airplanes or whatever. Go back a few more layers and you may see that the network is looking for eyes or ears or mouth or wheels.

Each layer in the depth CNN gradually establishes higher and higher levels of characterization. The last few layers tend to focus on any data you enter into the model. On the other hand, the early layers were more versatile and found many simple patterns in larger classes of images.

Migration learning means that you train CNN on a dataset, cut off the last layer, and retrain the last layer of the model on other different datasets. Intuitively, you are retraining the model to identify different, higher-level features. As a result, model training time is greatly reduced, so migration learning is a useful tool when you don't have enough data or the resources you need to train.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.