Original link http://www.cnblogs.com/surgewong/p/3351863.html
The Inverted index (inverted index), as the name implies, is a reverse index.
First, let's take a look at the index concept, the index is like the book directory, through the directory can quickly find the desired chapter.
The inverted index is equivalent to knowing the contents of the chapter, so you can find information about the directory.
Perhaps this analogy is not very clear, then we will give a simple example to illustrate.
Suppose we have three words:
T[0] = "It is the It is"
T[1] = "What's It"
T[2] = "It is a banana"
Here, our index is established between the location (position) and the word (word) .
Regular indexing refers to finding the words by location, such as: t[0] The first word is it, can be recorded as (0,0): "It", Again (2,1): "is".
The inverted index, in turn, gets the position through the word, such as: where the word "it" appears (0,0) (0,3) (2,0),
This can be remembered as "it": {(0,0) (0,3) (2,0)}.
By creating inverted indexes on the above three sentences, you can get:
' A ': {(2,2)}
"Banana": {(2,3)}
' Is ': {(0,1) (0,4) (2,1)}
"It": {(0,0) (0,3) (2,0)}.
"What": {(0,2) (1,0)}
by building a good inverted index, we can easily implement the retrieval of the statement,
For example, you need to retrieve the statement containing "what" and "is" "it" three words, ignoring the second digit in the inverted list (the position of the word in each sentence).
You can get {0 1}∩{0 1 2}∩{0 1 2} = {0 1}, so we conclude that t[0] and t[1] meet the conditions .
You also need to take into account the exact location of the word when retrieving the phrase "What's it"
So we can only get to t[1] to meet the conditions {(1,0) ()}.
In conclusion, the above analysis can tell us that the retrieval of words or sentences can be transformed into a set solution after the inverted index is constructed.
Instead of a line-by-word scan, this makes retrieval efficiency much better, which is why inverted indexes are so important in the search field.
There is also a problem in front of it, creating an inverted index is very time-consuming. Fortunately, this process can be done offline.
For more information please refer to Baidu Wikipedia, Baidu Encyclopedia, and related papers, etc.
"Go" inverted index