Indexing principle
Full-Text search technology has a long history, the vast majority are based on inverted index to do, there have been some other programs such as file fingerprints. Inverted index, as the name implies, it is the opposite of an article contains what words, it starts from the word, it records the word in which documents appear, consisting of two parts-dictionary and inverted list.
The dictionary structure is particularly important, there are many kinds of dictionary structure, each has its advantages and disadvantages, the simplest such as a sorted array, through the binary search to retrieve data, faster with a hash table, disk to find a B-tree, plus tree, but a can support terabytes of data in the inverted index structure needs to have a balance in time and space, Lists the pros and cons of some common dictionaries:
FST
The index structure that Lucene uses now
. NET face Question series (13) Lucene underlying principle