Baidu's Search Pinyin association function is roughly the principle of how to do it thank you

Source: Internet
Author: User
Baidu's Search Phonetic association function is roughly on the principle is how to thank you!
In the Baidu input Guangzhou below will be prompted Guangzhou, Guangzhou News. I'm thinking Baidu is not going to be some hot keywords, and then use a field to remember the pinyin of these keywords; search the table directly. If it is pinyin, it will be fuzzy to match this phonetic label column. The result is returned when the match is complete. These are just my imagination, as if there is a keyword weighting mechanism. Google can not find the relevant information, is there any other way to achieve what I did not think of it. Help the Warrior Thank you!


------Solution--------------------
The former and the company search Department of people hit a lot to get to know the search engine of the general principle of work.

There are many word lists inside search engines:

Stop word list, build Justice glossary, thesaurus, Chinese character-Pinyin thesaurus, suggest.

When you enter a Chinese phrase on the search engine, the search engine will first make a participle, and then use these words, respectively, to find out if there are any related information in the above mentioned list. As you say, you will find the thesaurus of pinyin-Chinese characters. Encounter Guangzhou = Guangzhou, will be automatically translated. Then take Guangzhou first to search.
When you enter a wrong word, you may be corrected by the search engine suggest and prompt you: Are you looking for XXX?

In fact, it's just one of the branches of the search engine that processes searches, and a search makes many requests in parallel.
For example, you enter a short phrase in the search engine.

The search engine will first determine what to search for:
1 whole sentence
2 standard participle (can be understood as Chinese grammar participle)
3 Natural participle (word segmentation by word, space, punctuation)
...

Then the sub-table takes each branch, the above mentioned the Auxiliary thesaurus, optimizes the content which will be searched.
Several branches request at the same time, get multiple result sets.
Next is to deal with the problem of sorting, in general, the whole sentence search to get the results of the highest correlation, so the highest weight, it should be ranked first. But the real-world search engine may also have to consider the promotion bit, as well as the content you want to search has more official results (such as your search Nginx,nginx official website should be ranked first). Or Baidu's Baidu promotion, it may be put in front.

This is roughly the case, in fact the logic of sorting is very complex. It determines the sort results based on several dimensions, which they call "curves." When he adjusts the parameters of each dimension, it has an effect on the sort results.


  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.