Inverted indexes inverted index, the first contact is in the elasticsearch inside, the index is used this, in fact, ES is also used lucene for the bottom, inverted index is the core algorithm of Lucene.
Online, "Inverted index" is the best way to implement the word-to-document mapping relationship.
Why is it called an inverted index? In fact, I think the Chinese translation of this name is not good, (in fact, I feel that programming above the terms are not turned up well, this is also a hindrance to the programmer to understand the important reason for learning, but everyone is so called, you have to follow the call, sometimes really understand this concept, you do think the Chinese name is too bad, So experience is: see a term, immediately to check English, and English documents, this "platoon" word is very misleading, in fact, I think translated into "reverse index" better.
Because, inverted index means, "Use content to index location" instead of the usual "Use location index content".
Next, talk about personal understanding:
In Lucene, for a document processing, first of all to analyze, (for English) is to remove the "Stop word", the size of the uniform, the change of speech to remove (all revert to the most original word), ES inside also emphasize this analyze process, It also supports user-specified analyzer (specific language uses a specific analyzer).
And then, the process of building the index:
The detailed process is written in this blog, Gray often understand.
Http://www.cnblogs.com/fly1988happy/archive/2012/04/01/2429000.html
The basic meaning is that for a word (i.e., the result of the above analyze step), you should count the number of articles it is in, the frequency of occurrences, and the position in each article.
This generates a dictionary file (term Dictionary), a frequency file (frequencies), a location file (positions)
The dictionary file also records additional information: The keyword points to the frequency file and location file pointers, and field information (the fields that the keyword belongs to)
One algorithm per week (1)---inverted index