The Latent Semantic Analysis (LSA) is also called latent Semantic indexing (LSI), which is to discover the potential meanings and concepts of these documents by analyzing the documents, latent That is, to establish the relationship between semantic (lexical family) and document potential meaning, It maps words and documents into a ' concept ' space and compares them within this space (note: a dimensionality reduction technique).
Latent Semantic Analysis (latent Semantic analyses), is a new branch of semantics. Traditional semantics usually study the meaning of words and words, as well as the relationship between word and word, like righteousness, synonyms, anti-righteousness and so on. Latent semantic analysis explores a relationship that is hidden behind words, which is not based on the definition of a dictionary, but rather as the most basic reference to the use of the word environment. This idea comes from a psychologist. They believe that hundreds of languages in the world should have a common, simple mechanism that allows anyone to master that language as long as they are grown up in a particular language environment. Under the guidance of this idea, people have found a simple mathematical model, the input of which is a library composed of documents written in any language, and the output is a mathematical expression (vector) of the words and words in the language. The comparison between words, the relationship between words, and even the meaning of any piece of article is generated by the operation between the vectors.
The concept of latent semantics is also applied to information retrieval, so sometimes latent semantics is also called implied semantic index (latent Semantic indexing,lsi).
Based on the SVD decomposition of lexical-document relation matrix, the dimensionality reduction processing of the data can be further realized, and the relation degree of lexical-document subject is revealed.
Reference:
http://blog.csdn.net/bob007/article/details/30496559
http://www.csdn.net/article/2015-02-05/2823865
Remark entry: Latent Semantic analysis (LSA)