The misty Butterfly Dance: About the Web page signal-to-noise ratio and correlation degree computation

Source: Internet
Author: User

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

Research SEO so long time, suddenly found himself always stay in the doorway of SEO not before, today began to study some deeper things, hehe.

Generally we check related keywords may be through two ways, one is to check Baidu related search, the other is through the eye, to see whether two words have correlation, in fact, the latter approach is quite unscientific and inaccurate.

Remember Ted do 163 mailbox This word, in Baidu Search "163 mailbox" appears "Qiu Shi da" This key word? At that time, many people began to study the relevant keywords, the previous period I have done similar experiments, search seo appeared "ethereal butterfly Dance." So how did this happen?

For instance, we now have two pages

A Web page is the content of the mobile phone description, the highest frequency of the keywords are: mobile phones, Bluetooth, color display

b Web content is mobile service, the highest frequency of the key words are: China Mobile, ringtones, SMS

If we just follow the visual, we'll come to the following results

A page and B pages are irrelevant

Search for a word does not appear B, and search B word, will not appear a

This is obviously wrong, when we look at the relevance of Web pages, if we just see the words on the surface of the page, we can not grasp a lot of relevant long tail vocabulary, we want to see the word behind the implication of deeper meaning.

We are searching for "mobile" this keyword, the search engine returned data is likely to be the following

{Color screen * color in the article weight, Bluetooth * Bluetooth in the article weight, ring tones * in the article weight, ...}

According to this algorithm, we can expand the keywords in an article to develop a number of related words, but also the corresponding vector of other relevant words to expand more vocabulary.

So we need to compute the matrix m of a related word.

If there are A and B words now

So m (a,b) = {keyword a,b correlation}

In this way, the correlation formula of the two articles becomes the r= Sigma vi*m (a,b) *vj

So how does correlation count?

For example, mobile phones and Bluetooth, we calculate in the following way

An article set {W}, the total number of articles is n, which contains the total number of words A N1, the total number of articles containing word B is N2, the total number of articles containing {a+b} is N12, so the correlation is calculated

corrab= n12/(N1+n2-n12)-(N1*N2)/(N)

Note that the calculated results here are likely to be negative if both A and B are small

corrab= n12/(N1+N2-N12)

So you can work out the correlation between the two articles.

Now let Baidu to tell us the phone and Bluetooth between the end will be Baidu think how much relevance

Search Mobile: Baidu, find the relevant page about 100,000,000 pieces

Search Bluetooth: Baidu, find the relevant page about 28,000,000 pieces

Search Mobile + Bluetooth: Baidu, find relevant pages about 22,400,000

corr{mobile phone, Bluetooth}=22,400,000/(100,000,000+28,000,000-22,400,000) =0.21 is 21%

Using this method to calculate the signal-to-noise ratio of web pages, is the most accurate measure of a Web page keyword, and of course the most important algorithm. Oh, do you see? No words to see more than a few times, these are a seoer must have the OH ~

This article originally contains: Ethereal Butterfly dance seo Dream (http://www.piaomiaodiewu.cn/)

Reprint please indicate the source

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.