Understanding the relevance between keywords and articles from the perspective of algorithms

Source: Internet
Author: User
Keywords articles words

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

Generally speaking, whether a word or phrase can become the key word of an article depends largely on the ability of the word or phrase to reflect the main idea of the article. The relevance between the key words and the article is mainly to illustrate the choice of a word and phrase, for the specified article, it reflects the main idea of this article or the meaning of the theme of how much. The extraction of keywords is influenced by the position of the words in the article, the frequency of occurrence and the semantic features of the words. So, the search engine in the end is how to judge the relevance between the keyword and the article? Here, the author from their own point of view, produced some ideas, should be a guide to everyone. Personally, the search engine should be from the following steps to analyze keywords and article sex:

First: The search engine first to the page to be analyzed to purify the treatment

Web page purification is mainly to remove a lot of useless ads in the Web page, navigation bar and other Web Template noise and meaningless content, such as JavaScript script, CSS markup and so on. What kind of algorithm is used by the search engine is not known to us, however, the individual estimate should be divided into the Web page for different fast, by measuring the importance of the page block to determine the content of the block, and then extract the content of the block, as for the search engine how to judge the importance of the Web page quickly, that is another topic.

Second: To extract the content of the word processing

Personally think that the search engine may have adopted some kind of algorithm, first of all, the word rough stage, first get the most n probability of the largest segmentation results, and then, using role tagging method to identify the unregistered words, and calculate its probability, will not sign the word into the segmentation of the word map, and then regard it as a common word processing, Finally, the dynamic programming is chosen to optimize the n maximum probability segmentation annotation results. and record.

Third: The result of the preliminary participle to remove meaningless words

Search engine through the second step of the word segmentation results analysis, remove some modal words and adjectives and other notional words and some words, but also take into account the single words expressed in incomplete and should be filtered. The removal of deactivated words is achieved by creating a list of deactivated words. Thus, by removing these meaningless words, the rest is the meaningful, analytical vocabulary.

Four: The weight of the keyword to determine the analysis

After the completion of the article segmentation and purification work, will be the article all the keywords analysis, the author's idea is the search engine will be the text represented as Ⅳ dimension eigenvector, each dimension component by the keyword and its weight composition. Generally, the key words in the text of the determination of the weight, mainly composed of three parts, word frequency, location and meaning of the common influence decision. The influence of Word frequency and position on words or phrases can be determined by the definite algorithm, and the weight of the word has fixed algorithm for analysis and calculation. The search engine uses the set algorithm to calculate and analyze the above keywords. To get the final result.

The author believes that the search engine through the above steps to analyze, get the final results, and the author here to talk about their own search engine specific methods of analysis, but personal opinion:

First: Search engine based on keyword location weight

In the document, the location of the keyword is important for the search engine to determine the weight of a keyword on the page. For example, the domain name by the search engine is the most fixed site factors, such as: The domain name contains the name of the DVD keyword, in the user to retrieve the key word DVD with innate advantages. Title is the most valuable resource of the website, the search engine thinks the title is displayed in the browser title bar, because it is displayed to the user, so it is the most important and concise summary of the file. Proper highlighting of keywords in the title of the proportion is very conducive to ranking of the increase.

Second: Search engine based on the frequency of keywords

The total number of different keywords in a Web page is a very important aspect. Personally think that although the location of keywords and word frequency size on the keyword weight, but the word frequency is too large to determine the words suitable as a keyword. To give a simple example, we optimize "America" in an article, the frequency of occurrence is very large, the position is also very important, but this word can not be given a higher weight, because "the United States" is also widely seen in other literatures, in these documents, "America" also exists in the location of the high and also more important. Therefore, for those words with high frequency but not suitable as a key word given the weight should be lower.

Third: The distance between important keywords in the document

Personal analysis, the distance between important keywords in a document should also be an important aspect of measuring the relevance of keywords to articles.

I think that the search engine in the above series of processing, so as to the key words to the article a certain score, when users search for a keyword, the high score of the article ranked in front of the opportunity is much, of course, this is excluded from the chain of influence. The above is a personal search engine some point of view, not necessarily correct, hope to learn together with you, finally, article Copyright: Guangzhou People Hospital: http://www.gzrlw.net/, Welcome to reprint, but please reprint the time to retain the link, thank you for your understanding and cooperation!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.