The algorithm was open source by Facebook in 2016, and the typical application scenario was "supervised text categorization issues".
Model
The optimization objectives of the model are as follows:
Among them, $<x_n,y_n>$ is a training sample, $y _n$ is the training target, $x _n$ is normalized bag of features. The matrix parameter A is based on Word's look-up table, that is, A is the embedding vector of the word. The mathematical meaning of $Ax _n$ matrix operation is to find the embedding vector of Word and then add or take the average to get the hidden vector. The Matrix parameter B is the parameter of the function F, the function f is a multi-classification problem, so $f (Bax_n) $ is a multi-categorical linear function. The optimization goal is to make the likelihood of this multi-classification problem as large as possible.
The optimization target is represented as a graph model as follows:
The difference from Word2vec
There are many similarities between this model and Word2vec, and there are many different places. Similar places let these two algorithms differ in place to let these two
A similar place:
- The structure of the graph model is similar to that of the embedding vector, which is used to obtain the implicit vector expression of word.
- Use a number of similar optimization methods, such as using hierarchical softmax to optimize training and scoring speed in predictions.
Different places:
- Word2vec is an unsupervised algorithm, while Fasttext is a supervised algorithm. Word2vec's learning goal is skip word, and Fasttext's learning goal is to manually label the results of the classification.
- Word2vec requires the training sample to have a "sequential" attribute, while Fasttext uses the idea of the bag of words, using the unordered attributes of N-gram.
V.s. Deep Neural networks
Fasttext only 1-layer neural network, belonging to the so-called shallow learning, but the effect of fasttext is not bad, and has the advantage of fast learning and prediction, in industry this is very important.
- than the normal neural network model of the accuracy is higher.
- Hundreds of times times faster training and evaluation. 1 billion words can be trained in a multi-core CPU in 10 minutes. 1 minutes to complete the classification of 1 million sentences in 310,000 types.
Literature
- 2016 Facebook's Fasttext
- Open Source Code
Easonzhao
Links: http://www.jianshu.com/p/b7ede4e842f1
Source: Pinterest
Copyright belongs to the author. Commercial reprint please contact the author for authorization, non-commercial reprint please specify the source.
Text classification algorithm with supervised Fasttext