Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall
In search engine technology, Chinese word segmentation is very important to influence search engine result ordering. We are in the actual search engine optimization, in order to avoid a lot of major keyword competition, will also use the Chinese word segmentation technology to do SEO optimization.
To give a simple example, if we need to optimize a page that is "bearing", then it is difficult to want the keyword to rank better in the search engine. Because the "bearing" this keyword heat is too high, so want to use the SEO method to optimize the search results to the home page is a very difficult thing. At this time we often use long tail keyword to optimize such a high heat keyword, that is to say, we often optimize some such as "Beijing bearing vendors", "Beijing import bearings" and so on the key words. and want to do this keyword search results of the forefront of the Chinese word segmentation technology and the layout of the key words, there is a great importance.
Chinese characters are broad and profound, different punctuation marks, different sentences represent different meanings. So, once a Google scientist said: "If you can do a good job in Chinese search engine, then we are not afraid of any language search engine." ”
So what is the meaning of Chinese word segmentation in search engine optimization? Participle of the impact of SEO is multifaceted, the most important thing is the impact of the long tail flow. For example, we often see many long tail words that we want to do, for example, Guangzhou Import bearing Sales, Shanghai import bearing sales, etc., but we through the previous text for SEO understanding can know, a page do not more than three keywords, because more than three will be dispersed each keyword weight, but one does not do well. But what if we want more than three? Then we need to use Chinese word segmentation to the combination of keywords, such as: Import bearings Sales-shanghai-Guangzhou. The results may not be the result of the Guangzhou import bearing sales or Shanghai import bearing sales of the key words directly, but the method of such participle so that many words have achieved good results. Multiple words rank on the top of the search engine results page and always have a broader coverage than a keyword ranking in the first place. Over time, because Guangzhou + import bearing Sales, Shanghai + Import bearings Sales of these pages let search engine know your page and "Import bearing sales" This keyword relevance is very high, so the import bearing sales The main keyword ranking will also be relatively improved.
Of course, the example above is that we haven't completely split the keyword. Here we will make a rough discussion of Chinese participle.
The earliest Chinese word segmentation method was proposed by Professor Liangnanyum of Beijing Aerospace University, a method of word segmentation based on "dictionary search". For example this sentence: "Famous director Zhang Yimou said the national day evening will arrange 100,000 people to the * * * *." ”
Using the Word dictionary method, all we have to do is to read the whole sentence and then mark the words in the dictionary separately, and find the longest word match when we encounter compound words (such as Peking University). A string that is not recognized is split into a single text. According to this method, we can cut the above text into:
"Famous | Director | Zhang Yimou | Say | National Day | Night | Will | arrangement | 100,000 people | to | *** | Gala”
Such a word method although can handle a lot of sentences, but because the subdivision of too much, in the real search engine use process, in the end which word is the focus can not be described, so that search engine search results can not reach the maximum correlation. So in the 80 's, the Harbin Institute of Technology Computer doctoral tutor Dr. Xiaolong proposed "the minimum number of words" word segmentation theory, that is, a word should be the least word string, so that more let search engines more understand what this sentence is meant. But while this approach is better, new problems are emerging. For example, when we do the "ambiguity" keyword group, we cannot say that the longest segmentation is the best result. For example, "Geely University City Bookstore," the correct word participle should be "geely | University Town | "Geely University in the bookstore" instead of the dictionary | City | Bookstore. "
At present, there are two main methods of word segmentation, one is based on statistical model of word processing, the other is based on string matching of the reverse maximum matching method.
Text processing based on statistical model
In terms of form, words are a combination of stable words, so the more times the adjacent words appear in the context, the more likely they are to form a word. Therefore, the frequency or probability of adjacent words and characters can better reflect the credibility of the word. The frequency of the combination of the adjacent words in the corpus can be counted, and their mutual information is calculated. Define the two-word mutual present information and compute the adjacent probability of two Chinese characters X and Y. The mutual information embodies the close degree of the bond between Chinese characters. When the tightness is higher than a certain threshold, it can be assumed that the word group may constitute a word. This method can only be used to statistics the frequency of the words in the corpus, do not need to cut the dictionary, so it is also called No dictionary segmentation method or statistical method. But this method also has certain limitation, will often take out a number of common frequently high, but not the words of the commonly used groups, such as "This One", "one", "some", "my", "many" and so on, and the common word recognition accuracy is poor, time and space overhead. The actual application of the statistical word segmentation system is to use a basic word dictionary (commonly used word dictionary) for string matching participle, at the same time using statistical methods to identify some new words, the serial frequency statistics and string matching, not only to play the matching segmentation speed, high efficiency, but also the use of dictionary segmentation and context to identify words, The advantages of automatically eliminating ambiguity.
Based on the statistical model of word processing, because of high technical, and just use in the search engine segmentation algorithm process, if the institute, for SEO Help will be even greater, you can join my SEO training courses for in-depth discussion. Here's a little bit more about the inverse maximum matching method based on string matching.
Generally speaking, we use most of the word segmentation method in SEO is based on string matching reverse maximum matching method. This approach is actually very simple. Let's illustrate with a simple example.
"Rising has been to quality and service to develop a safe market."
If we were to slice the sentence in a "Look up the dictionary", we would cut into the following sentence.
"Rui \ Star" has been the quality of the kimono \ Business Development Security Market
We can see that there is a significant wrong "kimono" in the forward segmentation, and "Kimono" The key word is Japan's traditional costumes, and the meaning of this sentence does not have any relationship, if it is true to participle, then in the real search engine indexing process, we search "Kimono" will also appear such a wrong result.
So we started doing the reverse maximum matching method. Read forward from the back of the sentence (Right-to-left)
"Rui star \ has been to \ quality \ and \ Service \ Open up security market."
This method of participle is correct.