We all know that search engine searches for a word that is very fast, but have you ever wondered why search engines can find what you want from hundreds of millions of pages at such a fast pace? One important reason is that modern search engines basically use reverse-index technology.
If you do not use the reverse indexing technique, each time you search, the search engine must traverse each page to find out if the page contains the keywords you specified. The workload is huge, with two main reasons:
- The Web page base of the internet is very large;
- It is not a simple matter to retrieve a specified keyword in each page, it needs to traverse each character of the page.
In order to better establish the mapping between the keyword being searched and the page containing these keywords, the reverse index is generated. To put it simply, the reverse index of the reverse, refers to the index is to find the corresponding source from the keyword, rather than retrieving the corresponding keywords from the source.
For example , to retrieve the keyword A, first from the index table of the reverse index, find the keyword A, and then find the page where a is located. Since the reverse index table is sorted, finding a keyword in it can be used in binary search, especially in the use of distributed data, server clusters, multithreading technology and other conditions, very high efficiency, so, find a page containing a keyword is very simple.
Suppose that the database contains 1 million records, of which 10 records meet the search criteria, if the use of reverse index, you can quickly find these keywords, and locate the 10 records containing these keywords, otherwise, the need to traverse 1 million records, the efficiency of the difference imaginable.
So, a reverse index is equivalent to a large dictionary of provenance, and each of these words can be used to tell you all its origins.
The keywords in the reverse index are generally the result of spiders ' word -breaking on web pages as they crawl. Chinese participle is also a more troublesome thing. For word breaker technology , please refer to other related articles.
The principle of reverse index and its application in full-text search