Multilingual SEMANTIC MODELS
In this post I ' ll discuss a model for learning word embeddings, such that they end up in the same space in different Langu Ages. This means we can find the similarity between some 中文版 and German words, or even compare the meaning of a sentences in different languages. It is a summary and analysis of the paper by Karl Moritz Hermann and Phil Blunsom, titled "Multilingual Models for Composi tional distributional Semantics ", published at ACL 2014. The Task
The goal of the extend the distributional hypothesis to multilingual data and joint-space embeddings. This would give us the ability to compare words and sentences in different languages, and also make use of labelled Traini Ng data from languages and other than the target language. For example, below are an illustration of 中文版 words and their Estonian translations in the same semantic space.
This actually turns off to be a very difficult task, because the distributional hypothesis stops working across different Languages. While "fish" are an important feature of "cat", because they occur together often, "Kass" never occurs with "fish", because They is in different languages and therefore used in separate sets of documents.
In order to learn these representations in the same space, the authors construct a neural network that learns from Paralle L sentences (pairs of the same sentence in different languages). The model is then evaluated on the task of topic classification, training on one language and testing on the other.
A bit of a spoiler, but here are a visualisation of some words from the final model, mapped into 2 dimensions.
The words from 中文版, German and French are successfully mapped to clusters based on meaning. The colours indicate gender (Blue=male, Red=female, green=neutral). The multilingual Model
The main idea is as Follows:we has sentence a in one language, and We have a function f (a) w Hich maps that sentence into a vector representation (we'll come back to that function). We then have sentence B, which are the same sentence just in a different language, and function g (b) &nbs P;for mapping it into a vector representation. Our goal are to have F (a) and g (b) be identical, because both of these sentences has the same mean Ing. So during training, we show the Model A series of parallel sentences a and B, and each time we adjust T He functions F (a) and g (b) so that they would produce more similar vectors.