Search Engine Chinese Word segmentation technology

Source: Internet
Author: User

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

Because many friends request to write a search engine word technology articles, especially about Baidu participle. I'll send it to you today.

Moon October 9 in seowhy Thursday answer group to explain the word segmentation technology today for everyone to learn about.

Participle technology: What is participle, how to word search engine will admit that the first friend to ask the question, presumably we have heard, very curious, what is word segmentation technology, what is Baidu participle? is a section of the word separated by characters, such as punctuation, space and so on.

What is the word segmentation technology? Participle technology is a kind of technology which is based on the user's keyword string using various matching methods after the query processing of the key string of the user submitting query. Everybody understands. Then we have to understand the word segmentation technology first to understand a concept. That is, query processing, when users submit queries to search engines, search hidden received the user's information to do a series of processing. The first is to index the relevant information inside the database,

This is query processing, then how does query processing work? Very simply, the user submits the string does not have more than 3 Chinese characters, will go directly to the database index vocabulary. More than 4 Chinese characters, first with delimiters such as spaces, punctuation, the query string into a number of subqueries. For instance. "What is the Baidu Word segmentation technology" we will be divided into the word "what is, Baidu, participle technology." This method is called the reverse matching method. 2. Then look at the user-supplied word for repetition.

If there is, it will be discarded, the default is a word. Next check the user submitted string, there are no letters and numbers. If so, think of letters and numbers as a word. OK, here's the SE query processing.

After the query processing, we have a basic understanding of word segmentation technology, especially Chinese word segmentation technology.

In fact, I am talking about the principles of search engines. OK, I'll talk about the principle of participle next. We use Baidu to give examples

How does Baidu come to participle? Participle technology is now very mature. He is divided into 3 kinds of technology.

1. String matching participle method

2. Word Word method.

3. Statistical Division of this law.

First of all.

is also commonly used participle method, Baidu is to use this kind of participle. String matching Word segmentation method, he is divided into 3 participle method.

1. Forward maximum matching method

What does that mean? A word from left to right.

For instance.

"I don't know what you're talking about."

How does this sentence use the positive maximum matching method to divide? I don't know what you're saying. "The inverse maximum matching is the corresponding to the forward maximum matching method. This is the second method of word segmentation.

2. Reverse maximum matching method to divide the above example is how to divide it? "I don't know what you're talking about." The inverse maximum matching method is divided into how the above paragraph is divided. "No, know, you are, say, what", this is more points, the reverse maximum matching method is from right to left.

3. Is the shortest path participle method.

What does this mean that the number of words I require to cut out is the least. Or the top of that sentence.

"I don't know what you're talking about." The shortest path participle means that I should have the fewest words in the sentence. Do not know, you are, say what, this is the shortest path participle method, divided out only 3 words. Well, of course, there are three kinds of the above can be combined to form some participle method. For example, the forward maximum matching method combined with the inverse maximum matching method can be called bidirectional maximum matching method. Well, the first one is finished,

2. Word Word method.

This is actually a kind of machine speech judgment participle

Method。 Very simple, syntactic, semantic analysis, the use of syntactic information and semantic information to deal with ambiguity phenomenon to participle, this method of participle is not mature. In the testing phase.

Third, the statistical method of participle.

This is very simple, that is, according to the statistics of the phrase, you will find two adjacent words appear most frequently, then the word is very important. Can be used as a delimiter in the user-supplied string. So to participle. For example, "Mine, yours, many, here, this, there." And so on, these words appear more, from these words to separate. Well, the word segmentation technology finished.

So we have just learned participle technology, and how to use them for our site to obtain traffic?

1. We can use Word segmentation technology to increase our site long tail words. This allows you to get the rank of traffic.

Not only these long tail words can get a certain ranking, but also to promote the site's target keyword to get a good ranking. This principle is the inner chain principle, here no longer speak. Having said so much, let's give an example.

For example: Sanya Hotel reservation, how to divide it?

Forward maximum matching, reverse maximum matching, bidirectional maximum matching, shortest link matching.

1. Forward Maximum matching

"Sanya, Hotel reservation"

2. Reverse Maximum match

"Sanya Hotels, reservation"

3. Bidirectional maximum Matching

"Sanya, hotels, reservation"

4. Shortest path maximum match.

"Sanya Hotel Reservation" Well, we split the word for

"Sanya," hotel reservations, booking, Sanya hotels, Sanya, hotels, Sanya Hotel reservation. ”

Each of these words can be done with a theme page as the target keyword

These separate words, put them all as the theme page of your site, import the weight of the link up, competitiveness is big, because these pages put him in the chain. Use anchor to connect, point to the homepage of the target keyword. Oh, this is the advantage of participle. He can improve the target keyword ranking of the competitiveness of the site also bring a certain amount of traffic. Once the import link weights come up, the competitiveness is big, because these pages put him in the chain.

Use anchor to connect, point to the homepage of the target keyword. Oh, this is the advantage of participle. He can improve the target keyword ranking of the competitiveness of the site also bring a certain amount of traffic. Participle also has an advantage. That is to promote the ranking of the inner pages. OK, I will not elaborate on that. Because I have written an article in seowhy. Everyone can go and have a look. is about Baidu, capture the description of the article. If your inner page does not describe, then Baidu will give you a description or capture a description from your page. When capturing a description, if you know which section he will capture, then you say that your ranking will not rise. You deliberately write a paragraph.

The article I wrote has the following address. Everyone can go and have a look.

Http://www.seowhy.com/bbs/thread-4451-1-1.html

Recently have friends sorted out and reproduced, reproduced no relationship, please specify the author and source (SEOWHY)

 

Editorial recommendation: Learning Web site Optimization Promotion of professional website-selected Learning Network  www.xuanxue.com  , the content is very good, Learning Network marketing, website promotion friends must not miss.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.