Analysis of Chinese word segmentation algorithm to help webmaster better do optimization

Source: Internet
Author: User

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

Word segmentation algorithm before occasionally understand some, but understand not thoroughly, recently read a lot of related books, and then go to the Internet and learn a part, is a general understanding. In fact, understand the word segmentation algorithm for personal webmaster Some small and medium-sized enterprises, are very helpful. By splitting these words, we can make our key words more accurate. Well, the bottom is to start today's text, if there is a wrong place, but also hope that everyone more correct.

The usual word segmentation algorithm is often for Chinese search engine, for Google does not exist, in Baidu and Google search the same keyword or phrase, return the result is different, this is not only different algorithms or technology for different reasons, more because of the existence of word segmentation algorithm. Baidu will be based on user search keywords to split, and Google is more to return the results directly.

Whether for English or Chinese, Search Engine index page is based on the word, because of the profound Chinese, and English words often differ a lot. Sometimes the same sentence, the position of the punctuation, different tones, the semantic is completely different, and English is not the problem, English is more to split the word. below to introduce my understanding of Chinese word segmentation algorithm.

General Chinese participle is divided into two kinds of matching based on dictionary and statistics, usually two methods are not single exist, but in mixed use.

The first is based on the dictionary matching method, according to the user search words, search engines will these words and their own dictionary entries to match, if the match succeeds, cut out a word. At the same time, according to the different direction, it is divided into two kinds: forward and reverse. In the forward matching, the difference between the length of words is divided into the maximum matching and the minimum matching. This dictionary based matching depends largely on the completeness of the dictionary and the update.

Based on this, as a webmaster, whether we choose the first page of the target keyword or content page of the long tail keyword, should be based on this principle, do not artificially create words, if your words are not the public often search, nor is it the default word, then the search will not be returned, so in the choice of keywords, can not take for granted, Be accurate in your judgment.

The second is based on the statistics of the word segmentation method, search engines will carry out a large number of calculations, including word and word adjacent probability, a phrase appears in the most places, the user search for a phrase or words back to look for what kind of content, these are search engines based on the basis of judgment. This method has obvious advantages, that is, the new words have a faster response, such as when a news appears, if we all search this new word, and Baidu can not judge, not to give the correct search results, then users will not buy the search engine accounts.

Based on this, we should associate with SEO important point, that is relevance. For example A5 Such a word, we all know is the representative Admin5 webmaster NET, but at the beginning may search engine does not know, if everybody searches more, in the different place is mentioned more, then the search engine will judge the A5 this word should and stationmaster have some kind of connection. Therefore, we do optimize the site should also pay attention to the relevance, and relevance of the site to do links, or to publish the relevance of content, can enhance the authority of the site in this area, when users search, can be more forward, have more opportunities by the search engine show.

Well, this article will be here, Chinese word segmentation algorithm is a feature of the search engine, he is aimed at their own thesaurus and update rate. In addition to the above mentioned page dependencies and keyword selection to follow the public search habits, you should also pay attention to the special treatment of words, such as bold, add black or use H tag. I hope the webmaster through this article on the Chinese Word segmentation algorithm has a more in-depth understanding, if there is a wrong place, also welcome to correct. This article from: Niu niu games, url: http://www.niuniuxiaoyouxi.com, also welcome reprint, reproduced please retain the copyright, thank you!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.