Do site to the main Baidu participle technology _ website application

Source: Internet
Author: User
Baidu claims to be the world's largest Chinese search engine, the most understanding of domestic netizens search habits, as the boss of China's search engine, many grassroots webmaster has been in the research of its search technology and ranking algorithm, there are many webmaster brothers and sisters to rely on its traffic to eat, for this grassroots webmaster are also love and hate, Baidu can bring a lot of traffic, but also ruthless K station or down the right.

OK, then let's talk about Baidu's word segmentation technology, maybe Little brother has said the wrong place, please take a brick, I hope we make progress together. Did not study Baidu participle before not understand why Baidu is stronger than Google. In fact, the word is cut, Baidu whether to take a Chinese string to cut it casually, of course not. So how to meet the conditions of the cut? The following experiment will find that if the string contains only less than 3 Chinese characters, it will remain fixed, such as: "Tractor" The word, the premise is a complete word, Baidu will not be cut, when the string length is greater than 4 Chinese characters, Baidu's word-breaker program will be launched. For example, "tractor price", look back to the results of the winning bid for the red word, not ugly out, the query has been cut into a "tractor", "price" two words, and then try to change a word. For example: We look at "Hengshui traction", Baidu submitted inquiries, found that the key words are "traction", "Hengshui", the continuous occurrence of the situation, you can see the "tractor" and "Hengshui" cut into two words, if we search "Hengshui help Kang Medical devices", found that the title is not, Web content is more complete is "Hengshui Help Kang Medical Devices", may be the site has not included the original reason, this long tail word was cut into the "Hengshui"/"Kang Medical equipment."

Then we are here to study Baidu is word segmentation algorithm, through several years of development, Baidu's word segmentation algorithm has been quite mature, which is also indispensable to Seoer's credit, there is a seoer predecessors said: "Baidu's algorithm has a simple complex, there is a positive maximum matching, reverse the largest match, bidirectional maximum matching, Language model method, the shortest path algorithm and so on, interested can use Google to search to increase understanding. "Let's summarize, there is my own speculation, Baidu has a strong vocabulary database, there are a variety of words we use in general, such as" names of people, place names, product names, "For example," Wang Yuanjie "," Hengshui "," tractor ", these are some of the thesaurus, in the cut Word will be the exclusive name cut out, such as "Help Kang" the word, the word library does not, the following is the use of bidirectional maximum matching word segmentation algorithm, if the positive and reverse matching word segmentation results are of course the best, you can directly output; If the two are inconsistent, forward matching a result, reverse matching a result.

Baidu has been claiming to be the world's largest Chinese search engine, but the word segmentation technology is not special, perhaps because Baidu has a very large special dictionary, this special dictionary is logged in the name (such as Filter press), manufacturer names (such as filter press Factory), some places (such as Fucheng, etc.), Web site (http:// www.xiaoshuo8.net/), and the dictionary continues to expand some new words, some new words are constantly included, this is more powerful than Google, Google is a word, and the thesaurus is significantly less than Baidu, which is Google itself to strengthen the place.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.