Key to SEO: Chinese Word Segmentation

Source: Internet
Author: User

In search engine technology, Chinese word segmentation plays a vital role in influencing the sorting of search engine results. In actual search engine optimization, we also use Chinese Word Segmentation technology to optimize Seo In order to avoid a lot of competition for many main keywords.

For example, if we want to optimize a webpage with "bearing" content, it is very difficult to rank this keyword in search engines. Because the keyword "bearing" is too popular, it is very difficult to optimize it to the homepage of search results through Seo. At this time, we often use long tail keywords to optimize such highly popular keywords. That is to say, we often optimize keywords such as "Beijing bearing distributor" and "Beijing imported bearing. To make such a keyword the first in the search results, it is of great importance to grasp the Chinese Word Segmentation technology and the keyword layout.

Chinese characters are profound and profound. Different punctuation marks and different sentences represent different meanings. Therefore, a Google scientist once said: "If you can do a good job in Chinese search engines, we will not be afraid of any language search engine research ."

So what is the significance of Chinese word segmentation in search engine optimization? Word Segmentation has many influences on Seo, and the most important is the impact on Long Tail traffic. For example, we often see a lot of long tail keywords we really want to do, such as Guangzhou imported bearing sales, Shanghai imported bearing sales, etc. But we can know through the previous article about Seo, the number of keywords on a page should not exceed three, because more than three keywords will distribute the weights of each keyword, but one cannot do well. But what if we want more than three items without any influence? At this time, we need to use Chinese word segmentation to combine keywords, such as import bearing sales-Shanghai-Guangzhou. In this way, the results may not be direct to Guangzhou imported bearing sales or Shanghai imported bearing sales keywords, but using this word segmentation method makes many words get good results. Multiple words rank at the top of the search engine result page, which is always wider than a keyword ranking at the top. Over time, because Guangzhou + imported bearing sales, Shanghai + imported bearing sales pages let search engines know that your page is highly correlated with the keyword "imported bearing sales, therefore, the ranking of the main keyword of imported bearing sales will also increase.

Of course, the example above shows that the keywords have not been completely split. Next we will give a rough discussion of Chinese word segmentation.

The earliest Chinese word segmentation method was proposed by Professor Liang nanyuan of Beijing University of Aeronautics and Astronautics, A Word Segmentation Method Based on "Dictionary. For example, the famous director Zhang Yimou said that 100,000 people will be arranged to join the gala on the National Day evening ."

To use the word segmentation method of "Dictionary", we need to read the entire sentence and mark all the words in the dictionary separately, when you encounter compound words (such as Peking University), you will find the longest word match. A string that is not recognized is split into a single text. Based on this method, the above text can be divided:

"Famous | Director | Zhang Yimou | said | National Day | evening | Yes | arrangement | 100,000 people | arrival | *** | Lianhuan"

Although such a word segmentation method can cope with many sentences, but due to too many subdivisions, which word is the key word in the process of real search engine use cannot be expressed, therefore, the search results cannot reach the maximum degree of relevance. Therefore, in 1980s, Dr. Wang Xiaolong, a computer doctoral tutor at Harbin Institute of Technology, proposed the word segmentation theory, that is, a sentence should be a string with the least word segmentation, this will make the search engine more aware of the meaning of this sentence. However, although this approach is better, new problems are also emerging. For example, when we create a keyword group of "ambiguity", we cannot say that the longest split is the best result. For example, the correct word segmentation of the keyword "Geely University City Bookstore" should be "Geely | University City | Bookstore" rather than "Geely University | city | Bookstore" in the dictionary ".

Currently, there are two mainstream Word Segmentation Methods: Statistical Model-based text processing and string matching-Based Reverse maximum matching.

Statistical Model-Based Text Processing

In terms of form, words are a stable combination of words. Therefore, the more times adjacent words appear at the same time in the context, the more likely they are to form a word. Therefore, the frequency or probability of adjacent co-occurrence between words can better reflect the word credibility. The frequency of the combination of adjacent co-occurrence words in the corpus can be calculated to calculate their co-occurrence information. Defines the mutual occurrence information of two words and calculates the adjacent co-occurrence probability of two Chinese characters X and Y. The interaction information reflects the closeness between Chinese characters. When the closeness is higher than a threshold, the word group may constitute a word. This method only requires statistics on the word group frequency in the corpus, and does not need to be divided into dictionaries. Therefore, it is also called the dictionary-less word segmentation method or the statistical word acquisition method. However, this method also has some limitations. It will often extract frequently used word groups with high co-occurrence frequency but not words, such as "this", "one", "some", "my", and "many". In addition, the recognition accuracy of common words is poor and the time-space overhead is large. In practice, the statistical word segmentation system must use a basic word segmentation Dictionary (commonly used word dictionary) for string matching and word segmentation, and use statistical methods to identify some new words, the combination of string frequency statistics and string matching not only makes full use of the features of fast and efficient matching and word segmentation, but also uses dictionary-free Word Segmentation in combination with context to identify new words and automatically eliminate ambiguity.

Statistical Model-based text processing is highly technical and only used for Word Segmentation in search engines.AlgorithmIn the process, if you learn, Seo will be more helpful, you can join my SEO training class for in-depth discussion. Here we will talk more about the reverse maximum matching method based on string matching.

In general, the most widely used word segmentation method in Seo is the reverse maximum matching method based on string matching. This method is actually very simple. Here is a simple example.

"Rising has been developing the security market with quality and service ".

If we split this sentence in a forward manner using the "Dictionary" method, It will be divided into the following sentence.

"Swiss/star/always/quality/kimono/service/development/security/Market"

We can see that there is a major mistake "kimono" in forward segmentation, And the keyword "kimono" is Japanese traditional dress. It has nothing to do with the meaning of this sentence, if the word segmentation is true, then in the process of real search engine index, we will also find such an error.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.