Keras:3) embedding layer detailed

Keras:3) embedding layer detailed _embedding

Last Update:2018-08-22 Source: Internet

Author: User

Tags keras keras model

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Embedding layer

Keras.layers.embeddings.Embedding (Input_dim, Output_dim, embeddings_initializer= ' uniform ', embeddings_regularizer =none, Activity_regularizer=none, Embeddings_constraint=none, Mask_zero=false, Input_length=none)

The embedded layer converts a positive integer (subscript) to a vector with a fixed size, such as [[4],[20]]->[[0.25,0.1],[0.6,-0.2]]

The embedding layer can only be used as the first layer of the model

Parameters

Input_dim: Large or equal to 0 integer, dictionary length, i.e. input data max subscript +1
Output_dim: An integer greater than 0, representing a fully connected embedded dimension
Embeddings_initializer: An initialization method for an embedded matrix, a string for a predefined initialization method name, or an initializer for initializing weights. Reference initializers
Embeddings_regularizer: A regular term for an embedded matrix, a Regularizer object
Embeddings_constraint: Constraint term for embedded matrix, constraints Object
Mask_zero: Boolean value that determines whether ' 0 ' in the input is considered a ' fill ' (padding) value that should be ignored, which is useful when using a recursive layer to handle variable-length input. If set to true, subsequent layers in the model must support masking, or an exception will be thrown. If this value is true, subscript 0 is not available in the dictionary and Input_dim should be set to |vocabulary| + 2.
Input_length: When the length of the input sequence is fixed, the value is its length. If you want to connect the flatten layer behind the layer and then connect the dense layer, you must specify the parameter, or the output dimension of the dense layer cannot be inferred automatically.
Enter shape
2D tensor of form (samples,sequence_length)
Output shape
3D tensor of the form (samples, sequence_length, Output_dim)

More laborious is the first sentence:
The embedded layer converts a positive integer (subscript) to a vector with a fixed size, such as [[4],[20]]->[[0.25,0.1],[0.6,-0.2]]

What the hell is going on, pro.
This refers to the word vector, specific look can refer to this article: Word2vec Skip-gram model, the following is only a simple description,

The process of the diagram above is to use the word vector to express the word of the article.
(1) To extract all the words of the article, according to its number of occurrences (here only take the first 50,000), such as the word ' network ' appears the most times, the number ID 0, and so on ...

(2) Each numbered ID can be represented using a 50000-D binary (one-hot)

(3) Finally, we will produce a matrix m, row size for the number of words 50000, column size is the dimension of the word vector (usually 128 or 300), such as the first line of the matrix is numbered id=0, that is, network corresponding word vector.

So how does this matrix M get? In the Skip-gram model, we randomly initialize it and then use the neural network to train the weight matrix.

So what are our input data and tags? As the following figure, the input data is the middle of which blue word corresponds to the One-hot code, the label is the One-hot code of the word near it (here windown_size=2, take 2 each)

In the above Word2vec demo, its Word table size is 1000, the word vector dimension of 300, so embedding parameters input_dim=10000,output_dim=300

Back to the original question: the embedded layer converts a positive integer (subscript) to a vector with a fixed size, such as [[4],[20]]->[[0.25,0.1],[0.6,-0.2]]

Give me a chestnut: if the Word table size is 1000, the word vector dimension is 2, after the word frequency statistics, Tom corresponds to the id=4, and Jerry corresponding to the id=20, after the conversion, we will get a m1000x2 matrix, and Tom corresponds to the matrix of the 4th line, The data to remove the row is [0.25,0.1]

If the input data does not need the semantic feature semantics of the word, simple use of the embedding layer can get a corresponding word vector matrix, but if the need for semantic features, we can greatly and train the word vector weight directly into the embedding layer can be specific look at the reference Keras supply of chestnuts: Using pre-trained word vectors in the Keras model

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More