Baidu Research and Development pen question analysis: Design a system to deal with the problem of word collocation

Source: Internet
Author: User
Tags hash


Design a system to deal with the problem of word collocation, for example, China and the people can match, then the Chinese people of China are effective. Requirements:

* The number of inquiries per second of the system may be thousands of times;

* The order of magnitude of the words is 10W;

* Each word can be paired with a 1W word at most

When the user enters the Chinese people, asks to return the information related to this collocation phrase.


Performance requirements: The number of inquiries per second reached thousands, meaning that QPS to reach more than 1000.

The search side uses multithreading, now the server is multi-core, can take full advantage of the resources of the service.


Create a large table of all the words and assign an ID to each word. The storage structure is as follows:

Id1,word1,id2,word2,..., Idn,word2

Word retrieval using hash, design a good word hashing algorithm, so that the performance of search words to achieve O (1)

Then build a table of collocation between words, words id+ words ID list (each word id+ it corresponding with the set of words ID)

Retrieval algorithm:

Search query---> Use participle to find all possible matches,

---"use hash to retrieve the corresponding words--" to find the possible collocation between them, can be paired with the return of the phrase, can not match, return empty results.

Oh, the small search engine is well designed.

The most common part of this system energy consumption is to find the collocation of possible phrases, the possible collocation of each word is 1000 times, the number of matches per two words is 10,000 times

If there are possible combinations of phrases in m, the average word between each phrase is n, and the maximum number of matches is n*m*10000.

Performance is maintained again O (n), similar to the server, QPS 1000, performance up to a few milliseconds no problem.

Author: csdn Blog hhh3h

More Wonderful content: http://www.bianceng.cn

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.