Zhongqiang: Talking about the technology of Word segmentation in website SEO

Source: Internet
Author: User

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

Remember when you first started to contact SEO when you know the keyword appears in the title is very important. Later know that there is a "word segmentation technology", the word Word, the title of the key word arrangements have a great help. Just start learning SEO, will be important keywords a pile in the title, such as the title will write: Webmaster network, personal webmaster, webmaster tools, webmaster download, webmaster helper-China Webmaster network. Although this can be written, but the user to see a website, an article title of the best is a sentence can be expressed clearly, rather than simply list the keywords, as far as possible so that users can read more smoothly.

Can be written: China Webmaster Network-to provide webmaster tools, downloads and webmaster information, is a personal webmaster good helper. Here is related to the understanding of the word segmentation technology.

Participle technology is when users submit a keyword string to search engine query, search engine to this keyword string to do a series of matching processing a technical method.

Search engine Query processing method

If the keyword is not more than three Chinese words directly to the database index vocabulary to find, more than three Chinese characters in the words with a space, comma, etc. separated. The user submits the keyword string to divide into several words to inquire.

For example: Silk Lanka wig Net-sales fashion, Non-mainstream wig brand. Search engine will be divided into, silk Lanka, wig, wig net, sale, fashion, Non-mainstream, brand, this method of participle becomes reverse match method.

  

In addition, see if the word has repeated words, and some words will be omitted.

For example: China Webmaster Network-to provide webmaster tools, downloads and webmaster information, is a personal webmaster good helper. will be the emergence of the four "webmaster" as a word to match, the default is a word. This is the search engine query processing.

Participle technology has developed to the present is very mature, Google is the purchase of Third-party Company's word segmentation technology, Baidu is its own development of the word segmentation technology, in Chinese participle of this Baidu to slightly ahead of Google. English words are separated by spaces. This word is better to do, for example: I am a Chinese, Chinese as "I am a Chinese", search engine can identify, Chinese is a word, but more difficult to identify "in, country, people" is three words to be combined is a word. In addition, some people call word cut words.

Related sorting and search engines

Search engine work is to collect the Web page, and then according to a certain number of rules to rank, currently estimated to have more than 10 billion of the pages are included, but also increasing. Search engine is the user submitted with the most relevant web page presented, we see Baidu's "76 pages" phenomenon is the problem, because all the pages do not have the highest ranking words, the user as long as the most relevant part of it can be. It can also be called a related sort. We are doing keyword analysis as much as possible with a relatively high correlation between the long tail keyword to do is also based on this theory.

1, Word segmentation technology Use string matching Word segmentation method can be divided into three major categories:

The first forward maximum matching method, from reading habits left to right participle.

The second inverse maximum matching method, with the first in turn, is from right to left.

The third kind of least keyword segmentation method, that is, a sentence as far as possible to separate fewer than a few keywords. For example: Silk Lanka wig nets, sales fashion ' non-mainstream wig will be divided: Silk lanka wig net, sales, fashion, non-mainstream wig.

Search engines are usually the above three methods combined to use, to minimize the error rate in participle, to provide users with the highest degree of matching web page information.

2, based on understanding of the word segmentation method

This method is to use the understanding of the whole sentence, through the grammar, semantics, meaning, description, etc. to understand the needs of users to deal with ambiguity phenomenon, also known as word sense word, this method is not very mature, in the testing phase.

3, the use of statistical participle method

This approach is to use their own database to detect the two words in a long time the most frequent, the most adjacent frequency is likely to constitute a word. But this method sometimes appears the error also more, for example will often see some, my, one, some, and so on, the recognition of these words is relatively poor. Baidu's related search function on this statistical segmentation method is more helpful.

Author information: My Silk lanka wig net http://www.silanka.net qq:253354150

Welcome reprint, Reprint please leave the author information, thank you.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.