Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall
What is Chinese participle?
As we all know, English is a unit of words, words and words are separated by space, and the Chinese word is the unit, all the words in the sentence to describe a meaning. For example, English sentence I am a student, in Chinese is: "I am a student." Computer can be very simple to know student is a word, but it is not easy to understand the "learning", "Sheng" two words together to represent a word. The Chinese character sequence is divided into meaningful words, that is, Chinese participle, some people also known as cutting words. Shanghai SEO Service, the result of participle is: Shanghai SEO Service
The current mainstream Chinese word segmentation algorithm has the following 3 kinds:
1. Segmentation method based on string matching
This method is also called the machine segmentation method, it is according to a certain strategy of the Chinese character string to be analyzed with a "full large" machine Dictionary of the terms of the match, if found in the dictionary a string, then matching success (identify a word). According to the different scanning direction, the string matching segmentation method can be divided into forward matching and reverse matching. According to the case of different length preference, it can be divided into the maximum (longest) matching and the minimum (shortest) matching, according to whether or not the process of POS tagging, but also can be divided into simple word segmentation method and the combination of Word segmentation and annotation integration method. Several commonly used mechanical participle methods are as follows:
1 forward maximum matching method (from left to right direction);
2 Reverse Maximum matching method (from right to left direction);
3 Minimum segmentation (the smallest number of words in each sentence).
These methods can also be combined with each other, for example, the forward maximum matching method and the reverse maximum matching method can be combined to form a bidirectional matching method. Due to the characters of Chinese words, the forward minimum matching and inverse minimum matching are seldom used. Generally speaking, the segmentation precision of reverse matching is slightly higher than that of forward matching, and the ambiguity phenomenon is less. The statistic results show that the error rate of single positive maximum matching is 1/169, and the error rate of simply using reverse maximum matching is 1/245. But this precision is far from satisfying the actual need. The actual use of the word segmentation system, is the mechanical participle as a primary means, but also through the use of various other language information to further improve the accuracy of segmentation.
One method is to improve the scanning mode, called feature scanning or symbol segmentation, priority in the string to be analyzed to identify and cut out some of the obvious features of the words, as a breakpoint, the original string can be divided into smaller strings and then into the mechanical participle, thereby reducing the matching error rate. Another method is to combine the word segmentation and lexical tagging, use rich parts of speech to help the decision making, and in the process of tagging in turn to the results of the word segmentation test, adjust, so as to greatly improve the accuracy of segmentation.
For mechanical Word segmentation method, can establish a general model, in this respect has the specialized academic thesis, here does not do the elaboration.
2, based on understanding of the word segmentation method
The method of Word segmentation is to make the computer simulate the people's understanding of the sentence, to achieve the effect of recognizing words. The basic idea is to make syntactic and semantic analysis at the same time, and use syntactic and semantic information to deal with ambiguity. It usually consists of three parts: the segmentation subsystem, the syntactic system, the general control part. Under the coordination of the general control part, the segmentation subsystem can get the syntactic and semantic information about words and sentences to judge the ambiguity of word segmentation, that is, it simulates the process of human understanding of sentences. This kind of word segmentation method needs to use a lot of language knowledge and information. Because of the generality and complexity of Chinese language knowledge, it is difficult to organize various language information into the form of machine direct reading, so the word segmentation system based on understanding is still in the experimental stage.
3. Segmentation method based on statistics
In terms of form, words are a combination of stable words, so the more times the adjacent words appear in the context, the more likely they are to form a word. Therefore, the frequency or probability of adjacent words and characters can better reflect the credibility of the word. The frequency of the combination of the adjacent words in the corpus can be counted, and their mutual information is calculated. Define the two-word mutual present information and compute the adjacent probability of two Chinese characters X and Y. The mutual information embodies the close degree of the bond between Chinese characters. When the tightness is higher than a certain threshold, it can be assumed that the word group may constitute a word. This method can only be used to statistics the frequency of the words in the corpus, do not need to cut the dictionary, so it is also called No dictionary segmentation method or statistical method. But this method also has certain limitation, will often take out a number of common frequently high, but not the words of the commonly used groups, such as "This One", "one", "some", "my", "many" and so on, and the common word recognition accuracy is poor, time and space overhead. The actual application of the statistical word segmentation system is to use a basic word dictionary (commonly used word dictionary) for string matching participle, at the same time using statistical methods to identify some new words, the serial frequency statistics and string matching, not only to play the matching segmentation speed, high efficiency, but also the use of dictionary segmentation and context to identify words, The advantages of automatically eliminating ambiguity.
That Word segmentation technology SEO optimization in what help?
SEO optimization process is absolutely inseparable from the technology as auxiliary!
Take Shanghai SEO as a comparison to see:
(Shanghai SEO Service Network to help enterprises or personal sites to provide quality website optimization services, search engine optimization services, website planning. Seo-sh is based on SEO optimization services, website planning marketing as the core of the Shanghai SEO Optimization Service network)
Through the word segmentation technology can be divided into: Shanghai seo, Shanghai SEO services, SEO services, corporate website optimization services, personal website optimization services, search engine services, search engine optimization and so on. So to do SEO optimization is inseparable from the use of Word segmentation technology skilled use of word segmentation to understand the search engine is every seoer must learn a lesson!