Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall
The TF algorithm is a statistical algorithm for weighting the retrieval. The simple thing to do is to evaluate how important a word is to a file.
In the SEO derivative application, we can understand this passage: In a company, there are 10 seoer, everyone wrote an article about SEO, and put these articles in a document set. We can expect that the basic each article will repeatedly appear in the SEO this word, meaning that these 10 articles are related to SEO. Now I want to find an article about the weight of the site SEO. Then I will enter the search engine "SEO site weight."
Finally I found two articles at the same time appeared in the two words, the first one appeared 2 times "site weight" and 10 times "SEO, another appeared 10 times" Site weight "and 2 times" seo. " Now the problem is: Put aside the author's quality (the overall weight of the site), article quality (page weight), the recommendation of the Company's experts (high-quality outside the chain) and other factors, whose article should be ranked in front of the search results?
With this problem, we will learn the TF algorithm and the TF algorithm in the SEO derived applications.
The core concept of TF
If a word or phrase appears in an article with a high frequency tf and is rarely seen in other articles, it is considered that the word or phrase has a good category distinguishing ability and is suitable for categorization.
At the same time, if there is a word in the article that we want to query, we will think that the article is more relevant to the word we are looking for. The continuation of this idea is that if the number of words in a document that appears to be queried is more frequent, the greater the relevance of the article to the word being queried.
We in the previous SEO work, the use of keyword density technology, based on this tf principle.
So in the TF algorithm, we first define a TF (T,D) to denote the number of occurrences of the word t in article D.
We can query the TF value through the keyword Density query tool:
Http://tool.chinaz.com/Tools/Density.aspx
But only consider the number of words appear is not, because often we query are more than two words, such as "AA BB" or "XX YY ZZ" and other forms. If this is the form of the query, which word appears the number of times should be the basis for the importance of it? This leads to the IDF to measure the scarcity of words, where we define the IDF as IDF (t) = log (N/DF (t)). which:
DF (t): the word (represented by T) appears in many articles. The query is through Google search for a word t, the result of the search results we can understand DF (t).
N: Total number of articles. This value does not have any practical use in our SEO work, because we cannot know how many articles the search engine indexes. But for search engines, n is a judgment word weight data.
LOG: This is not our SEO work to consider the value, in general, log the base can be arbitrarily set. Generally speaking, we use +1 of the way to suppress the above "website weight SEO" Example of the second 10 times "site weight" article than the first 2 times "site weight" article important 5 times times the exaggeration of the situation.
TF SEO Application Combat
See here, is not feel very irritable? Let's go into an ad ... Uh, no, go into an example:
TF value = TFXIDF (tf times IDF) = 1+log TF (t,d) Xlog (N/DF (t))
To "website weight seo" and "seo Learning: What is the weight of the site" This article as an example:
The site weights tf value is: W=1+log 31 (occurrences) =2.49
"Site Weight" IDF value is: 23,200,000/1 trillion (assumed value, 08 data) =4.63
The site weight tf value is: 2.49*4.63=11.53
"SEO" TF value is: W=1+log 34 (occurrences) =2.53
"SEO" IDF value is: 1,220,000,000/1 trillion (assumed value, 08 data) =2.91
"SEO" TF value is: 2.53*2.91=7.36
We got the "site weights" TF value 11.53 and "SEO" TF value 7.36. What's the use?
The larger the TF value, the more relevant the article is to the index word.
Only when the "site weight" the word high weight of the page, it is possible in the "Site weight SEO" This search results rankings have a better effect;
Anchor text links need to strengthen the word "site weight";
If we do for this page "SEO" anchor text, it will not be too good performance;
In the absence of other factors weighted or down the weight of the case, less than the total word weight of the page 18.89 (11.53+7.36) will be ranked lower, more than 18.89 of the page will be ranked higher than the second article
TF in SEO application summary
The above is just an example of a tf in SEO derivative applications. Both the TF method and the hypothetical condition of the case are not rigorous and accurate. But this does not prevent us to understand "keyword density" this SEO technology principle. At the same time, also in terms of keyword rankings, with competitors have a measurable reference.
Whether Baidu or Google or other search engines, TF is only a small part of its search ranking algorithm. At the same time in order to combat keyword piling, the major search engines will be the TF value to make certain restrictions. SEOmoz gives a safe frequency number for each page does not repeat 15 word keywords. Instead of simply using 2%-8% keyword density. Of course this proposal is based on foreign search engines.
When we are learning seo, we really need to understand some technical and theoretical knowledge, which can help us to better carry out the work. But at the same time, we do not need to tangle in some pure theoretical and technical aspects of the problem, after all, in the SEO industry, the actual combat and experience is also extremely important.
This article by Yangfan original in Yang seo, reprint please keep the link:
Http://www.seoyangs.com/tf-idf-seo.html