Keras:3) embedding layer detailed _embedding

Source: Internet
Author: User
Tags keras keras model

Embedding layer

Keras.layers.embeddings.Embedding (Input_dim, Output_dim, embeddings_initializer= ' uniform ', embeddings_regularizer =none, Activity_regularizer=none, Embeddings_constraint=none, Mask_zero=false, Input_length=none)

The embedded layer converts a positive integer (subscript) to a vector with a fixed size, such as [[4],[20]]->[[0.25,0.1],[0.6,-0.2]]

The embedding layer can only be used as the first layer of the model

Parameters

Input_dim: Large or equal to 0 integer, dictionary length, i.e. input data max subscript +1
Output_dim: An integer greater than 0, representing a fully connected embedded dimension
Embeddings_initializer: An initialization method for an embedded matrix, a string for a predefined initialization method name, or an initializer for initializing weights. Reference initializers
Embeddings_regularizer: A regular term for an embedded matrix, a Regularizer object
Embeddings_constraint: Constraint term for embedded matrix, constraints Object
Mask_zero: Boolean value that determines whether ' 0 ' in the input is considered a ' fill ' (padding) value that should be ignored, which is useful when using a recursive layer to handle variable-length input. If set to true, subsequent layers in the model must support masking, or an exception will be thrown. If this value is true, subscript 0 is not available in the dictionary and Input_dim should be set to |vocabulary| + 2.
Input_length: When the length of the input sequence is fixed, the value is its length. If you want to connect the flatten layer behind the layer and then connect the dense layer, you must specify the parameter, or the output dimension of the dense layer cannot be inferred automatically.
Enter shape
2D tensor of form (samples,sequence_length)
Output shape
3D tensor of the form (samples, sequence_length, Output_dim)

More laborious is the first sentence:
The embedded layer converts a positive integer (subscript) to a vector with a fixed size, such as [[4],[20]]->[[0.25,0.1],[0.6,-0.2]]

What the hell is going on, pro.
This refers to the word vector, specific look can refer to this article: Word2vec Skip-gram model, the following is only a simple description,

The process of the diagram above is to use the word vector to express the word of the article.
(1) To extract all the words of the article, according to its number of occurrences (here only take the first 50,000), such as the word ' network ' appears the most times, the number ID 0, and so on ...

(2) Each numbered ID can be represented using a 50000-D binary (one-hot)

(3) Finally, we will produce a matrix m, row size for the number of words 50000, column size is the dimension of the word vector (usually 128 or 300), such as the first line of the matrix is numbered id=0, that is, network corresponding word vector.

So how does this matrix M get? In the Skip-gram model, we randomly initialize it and then use the neural network to train the weight matrix.

So what are our input data and tags? As the following figure, the input data is the middle of which blue word corresponds to the One-hot code, the label is the One-hot code of the word near it (here windown_size=2, take 2 each)

In the above Word2vec demo, its Word table size is 1000, the word vector dimension of 300, so embedding parameters input_dim=10000,output_dim=300

Back to the original question: the embedded layer converts a positive integer (subscript) to a vector with a fixed size, such as [[4],[20]]->[[0.25,0.1],[0.6,-0.2]]

Give me a chestnut: if the Word table size is 1000, the word vector dimension is 2, after the word frequency statistics, Tom corresponds to the id=4, and Jerry corresponding to the id=20, after the conversion, we will get a m1000x2 matrix, and Tom corresponds to the matrix of the 4th line, The data to remove the row is [0.25,0.1]

If the input data does not need the semantic feature semantics of the word, simple use of the embedding layer can get a corresponding word vector matrix, but if the need for semantic features, we can greatly and train the word vector weight directly into the embedding layer can be specific look at the reference Keras supply of chestnuts: Using pre-trained word vectors in the Keras model

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.