This article by larrylgq prepared, reproduced please note the Source: http://blog.csdn.net/larrylgq/article/details/7399237
Author: Lu guiqiang
Email: larry.lv.word@gmail.com
When our data reaches a certain level and cannot be processed through RDBMS, the general practice is to use rpc/HTTP + Index Server + database.
The common practice is:
Use cron to regularly pull data from the database and transmit the data to the Indexing Server. The Indexing Server creates indexes, and the business machine accesses the Indexing Server through rpc/HTTP, fuzzy match with keywords directly consumes an astonishing amount of time, so
Generally, keyword prefix redundancy is performed. Normally, we can see that the method of keyword prefix redundancy is through the trie tree (Aho-corasickAlgorithm.
The benefit of this algorithm is that
1: combine common prefixes to avoid space waste
2: The maximum amount of computing is the breadth of the tree * the length of the searched word
However, it is still too slow to search for the trie tree in real time. We need a faster method to find matching keywords:Hash and linked list
When storing keywords, prefix redundancy is performed on the keywords and the keywords are sorted and stored as follows:
In this way, we can quickly find the position of the user's input value through the hash value, and then look down to find matching keywords
* Introduction to hash http://blog.csdn.net/larrylgq/article/details/7383527
However, compared with trie, there is also a disadvantage that the storage space will become larger.