Recently a friend asked the forward site in addition to Hefei SEO, Hefei website promotion to do the first page, and Anhui SEO, Anhui website promotion, Anhui website optimization is also ranked in the Baidu homepage. So by this opportunity to talk to you about Baidu's Chinese word segmentation technology, are also some of their own understanding, there are more ideas welcome exchanges.

Seoer know, search engine working principle has one step preprocessing, will crawl the page to carry on several procedure processing, for after the rank mechanism completes preliminary preparation. In the process of preprocessing, there are participle of this technical process. English is a word as the basic unit, the words are separated by a space between the words, the Chinese word is in the middle of the unit without a connector to become a word. Therefore, Chinese participle and English participle are very different. And from the existing search engine in particular, Baidu's Chinese word segmentation research shows that the main is based on dictionary matching and based on statistics.

1, based on dictionary matching refers to the target word or sentence and the existing dictionary entries in the matching processing, after scanning to match into sentence, word, word form.

(1) According to different scanning direction, can be divided into forward matching and reverse matching.

(2) According to the matching length of priority, can be divided into maximum matching and minimum matching.

(3) According to the scanning direction and length priority mixed matching, can be divided into forward maximum matching and reverse maximum matching.

2, based on statistics refers to the analysis of a large number of data samples, scanning the calculation of words or sentences appear in the statistical probability, a few words neighbors appear more and more can form a word, the same can form a sentence.

The accuracy of dictionary matching depends on the completeness and timeliness of dictionaries and is very fast. The method of segmentation based on statistics is more flexible, and it is helpful to eliminate ambiguity, but it is slow. The Chinese word segmentation in search engines is often a combination of these two word segmentation techniques to improve speed and accuracy. This is what we call Chinese word segmentation technology, then the same based on these Chinese word segmentation technology, we want to maximize the use, return to Hefei SEO blog, talk about the vanguard of Chinese word segmentation technology application and speculation.

1, from the key words Hefei SEO will be divided into "Hefei" + "seo", "Hefei website promotion" will be divided into "Hefei" + "website" + "promotion" or "Hefei" + "website promotion" or "Hefei website" + "promotion" and other forms. This word segmentation can be well understood, and in our site has been widely used. Baidu search "Hefei seo" The first 20 results page only one is not exactly match, "Hefei website promotion" The first 20 results only four not exactly match

2, from the key words "Anhui Hefei seo" can be divided into "Anhui" + "Hefei" + "seo" or "Anhui Hefei" + "seo" or "Anhui" + "Hefei seo", but here can be divided into "Anhui seo" + "Hefei". According to this, Forward SEO blog title can be divided into "Anhui" + "website Promotion", "Anhui" + "website optimization." From the effect, there are several factors:

(1) have a primary and secondary points, the competitiveness of the word put together, the competitiveness of small words to combine the split. Forward SEO blog Lord do "Hefei seo", participle "anhui seo"

(2) The site will appear to split the word, such as "Anhui seo" and so on, this is to remind the search engine is such a participle, but also they can be bold, reverse display, anchor text and other prominent participle effect.

(3) In the external link or link building process, in addition to the main keyword, but also to do the word split.

3, there is also a point of view is that the search engine gradually do to simulate the Chinese grammar, to understand the sentence. "Anhui Hefei seo-Hefei website promotion-Hefei website optimization" Can be divided into "Anhui seo" + "Hefei seo" + "Anhui website promotion" + "Hefei website promotion" + "Anhui website optimization" + "Hefei website optimization." Of course, the Chinese are profound, the current machine language is still difficult to do based on understanding of the word segmentation technology.

The last thing to say is that the reason forward SEO blog Chinese participle can do so effect, this and its Anhuiseo domain name inseparable. Of course, this is very suitable for Baidu search engine, Google seems not very bright. This article by the Entrepreneurial Information ( webmaster Original, reproduced please indicate the source, thank you!!

