Analysis on the principle of the search engine on the Web text segmentation

Source: Internet
Author: User

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

For SEO personnel, the main goal of their work is the search engine, so a profound understanding of the search engine operating mechanism to help us optimize for the search engine, which is equivalent to the two countries jiaobing, must know the other side of the actual situation, and then analyze their advantages, and then invasion to eliminate each other, If you do not know the other side of the actual situation, others Ii Penrau, then you fail to be sure! and in the analysis of search engines, know that its operating mechanism and word segmentation technology is very important one! The following will take their humble opinion and the majority of webmaster friends to share!

The first step in search engine work: Extract page text

The first is to crawl the text of the page, in general, the search engine will be related to the corresponding words extracted from the keyword, there is a meta tags and so on, there is the keyword and description and the image of the ATL attributes, etc. There is also the relevant text of the Web page, so many flash sites in search engine optimization will eat a lot of losses, because there is not a lot of text, and search engines will not crawl Flash source code! So many do Flash site optimization will basically be a set of code program, Let the relevant text and content corresponding, so that can be identified by the search engine!

The second step of search engine work: Chinese word segmentation technology

When the search engine to crawl the text, the next job is to the text for word segmentation, say a word into one of the phrases, such as the Qi Tian San Sun wukong This phrase, will be divided into the holy and Monkey King two words, also such as: Willow as Cold month this word, We can see the difference between Baidu and Google participle by illustration!

  

  

These two search results are different, Google is more inclined to Liu is as a noun, so in Liu is the bar became the first match! And for Baidu, directly to the willow as cold month this word has become willow, and cold month, so the Liu is posted but did not appear on the home page, Why is there such an obvious difference? The key is that Google does not have a proprietary dictionary, so the matching method will have some differences, we have to target different search engine keyword optimization, in the content to try to close to the keyword, and not to let the keywords and content separated, so the keyword ranking is very difficult to go up!

The second step of search engine work: matching technology

One: positive matching, above the willow as cold month is a positive match, this way of matching to help eliminate ambiguity, so that the results of the search more accurate, and will not be willow, become Liu is!

Second: reverse matching, which is a way to match backwards.

Third: Maximize matching, such as the United States of America is free, the largest match becomes the United States of America, Freedom!

Four: Minimize match, still take the United States of America is free, the smallest match became the United States, Lee, the public, the country, free, and in the search engine actual participle process, will be the combination of these several matching methods, will not only use one of the search engine's word segmentation technology, the ultimate goal is only two points, We want to run the two points to search engine optimization can help improve the ranking of the site! First, through a variety of matching techniques to eliminate ambiguity in the text, so that the search word out of the content more accurate and complete! The second is to use a variety of matching methods to put some names, Place names and organization names and some words that never landed, such as the mantra, the popular language and so on to statistics, and then the results of the statistics and users want to understand the content of different ways to match, so that users get what they want!

Hope that the above introduction can help you understand the search engine operation mechanism, and a preliminary understanding of Chinese participle, in fact, there are many details of word segmentation technology, we can in the optimization process of continuous summary, then we will be able to get better keywords, and as soon as possible to optimize their own site to the front! Source: Www.hhxjt.comusb TV Stick Original A5 starting, reprint please keep.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.