Chinese characters are profound and profound. Different punctuation marks and different sentences represent different meanings. Therefore, a Google scientist once said: "If you can do a good job in Chinese search engines, we will not be afraid of any language search engine research ."
Chinese word segmentation plays a vital role in influencing the sorting of search engine results. In actual search engine optimization, we also use Chinese word segmentation technology to optimize SEO in order to avoid a lot of competition for many main keywords.
Currently, there are two mainstream word segmentation methods: Statistical model-based text processing and string matching-based reverse maximum matching.
Statistical model-based text processing
In terms of form, words are a stable combination of words. Therefore, the more times adjacent words appear at the same time in the context, the more likely they are to form a word. Therefore, the frequency or probability of adjacent co-occurrence between words can better reflect the word credibility. The frequency of the combination of adjacent co-occurrence words in the corpus can be calculated to calculate their co-occurrence information. Defines the mutual occurrence information of two words and calculates the adjacent co-occurrence probability of two Chinese characters X and Y. The interaction information reflects the closeness between Chinese characters. When the closeness is higher than a threshold, the word group may constitute a word. This method only requires statistics on the word group frequency in the corpus, and does not need to be divided into dictionaries. Therefore, it is also called the dictionary-less word segmentation method or the statistical word acquisition method. However, this method also has some limitations. It will often extract frequently used word groups with high co-occurrence frequency but not words, such as "this", "one", "some", "my", and "many". In addition, the recognition accuracy of common words is poor and the time-space overhead is large. In practice, the statistical word segmentation system must use a basic word segmentation Dictionary (commonly used word dictionary) for string matching and word segmentation, and use statistical methods to identify some new words, the combination of string frequency statistics and string matching not only makes full use of the features of fast and efficient matching and word segmentation, but also uses Dictionary-free word segmentation in combination with context to identify new words and automatically eliminate ambiguity.
Statistics-based text processing is highly technical, and is only used in the process of search engine word segmentation algorithms. If you learn it, SEO will be more helpful, you can join my SEO training class for in-depth discussion. Here we will talk more about the reverse maximum matching method based on string matching.
In general, the most widely used word segmentation method in SEO is the reverse maximum matching method based on string matching. This method is actually very simple. Here is a simple example.
"Rising has been developing the security market with quality and service ".
If we split this sentence in a forward manner using the dictionary lookup method, %