Topic:
Design a system to deal with the problem of word collocation, for example, China and the people can match, then the Chinese people of China are effective. Requirements:
* The number of inquiries per second of the system may be thousands of times;
* The order of magnitude of the words is 10W;
* Each word can be paired with a 1W word at most
When the user enters the Chinese people, asks to return the information related to this collocation phrase.
Analysis:
Performance requirements: The number of inquiries per second reached thousands, meaning that QPS to reach more than 1000.
The search side uses multithreading, now the server is multi-core, can take full advantage of the resources of the service.
Data:
Create a large table of all the words and assign an ID to each word. The storage structure is as follows:
Id1,word1,id2,word2,..., Idn,word2
Word retrieval using hash, design a good word hashing algorithm, so that the performance of search words to achieve O (1)
Then build a table of collocation between words, words id+ words ID list (each word id+ it corresponding with the set of words ID)
Retrieval algorithm:
Search query---> Use participle to find all possible matches,
---"use hash to retrieve the corresponding words--" to find the possible collocation between them, can be paired with the return of the phrase, can not match, return empty results.
Oh, the small search engine is well designed.
The most common part of this system energy consumption is to find the collocation of possible phrases, the possible collocation of each word is 1000 times, the number of matches per two words is 10,000 times
If there are possible combinations of phrases in m, the average word between each phrase is n, and the maximum number of matches is n*m*10000.
Performance is maintained again O (n), similar to the server, QPS 1000, performance up to a few milliseconds no problem.
Author: csdn Blog hhh3h
More Wonderful content: http://www.bianceng.cnhttp://www.bianceng.cn/Programming/sjjg/