distributional Vector VS. Distributed Vector
Similarities and differences.
For natural language the same point
are based on distribution ideas using the same corpus distributional models (Bow,lsi,lda)
Altogether now the same text area morphemes (such as the same statement) related, the more common statements in the corpus, the more relevant to use the number of common statements to construct words and words (the context) of the PMI/PPMI matrix (high-dimensional sparse matrix), and then SVD to get each word of the low-dimensional dense vector (hidden vector) distributed Models (Nplm,lbl,word2vec,glove)
The words that appear in the same context are relevant, the more the same context is in the corpus, the more relevant, and not requiring the simultaneous emergence of thought from deep learning (Inspired by learning), using predictions instead of co-occurrence counts Example
A Dog is in the class.
A cat is in the.
The dog and the distributional
The difference between dog and cat is distributed thought
Distributional thought
The words that appear in the same context (context) are related.
It is a horizontal thought, such as the sentence "There are many stars in the sky Tonight", "Sky" and "star" horizontal correlation. Distributed thought
In a similar context (context) the word is related.
A similar context can be the same sentence, or it can be a different statement (portrait).
It contains a longitudinal thought, such as the sentence "There are many stars in the sky Tonight", "Tonight the Sky has the Moon", "Stars" and "Moon" vertical correlation. Method differences
distributional using implicit matrix decomposition (implicit matrix factorization) distributed using neural network word embedding (neural word embedding) Distributional using the co-occurrence count to construct the original matrix distributed the prediction of contextual words through neural networks
For relational network distributional
is related to points in the same path (depending on the length of the path, the simplest is to consider only 1 neighbors) if you construct the network using the co-occurrence of the word, then there are points related to the edge connection or using the adjacency matrix of the graph to construct the original matrix, then the matrix decomposition Distributed
Not only considers the point dependencies of the adjacency but also the correlation of the unconnected points that have a common neighbor
that is, if the two points are not directly connected, but they have the same neighbors, or the same neighbor points are many, they also have similarities You can use Deepwalk to copy a statement (Word sequence) to a point in a diagram to construct a sequence of graph nodes