I'm often asked to help you talk about SEO related data analysis How to do, and even expect a few hours of crash. However, this area covers too much, it is impossible to words to do, also do not know where to simply summarize.
SEO is a very immature industry, can refer to the information is very limited, most things have to be groped through the data, and groping process, often can expose the past intuition did not realize the truth. So the data is sometimes very cruel to SEO, often a simple data can deny a group of people years of hard work.
Take the most common example, SEO flow is how to make up? Most people think that is propped up by popular keywords, or think it is a lot of hot keyword propped up, so that the top words of the ranking up traffic will certainly rise, but not. For most of the larger site, the vast majority of SEO traffic by Baidu Index can not see the extremely long tail of the word brought. So even with the strength to do the ranking of hot words, their contribution to the flow is still very limited.
Some people may think it's incredible because they've never seen such data. But this data is only through the analysis of the site's original log to come to the conclusion, and Google Analytics such a very low sampling rate of the statistical system can not see such data. Even if you can skillfully use GA seo is very few, so the truth of SEO traffic is almost never seen.
So, sometimes listen to people say "SEO has died", and I often say "SEO has not lived", is not exaggerated.
How to start learning data analysis
The ultimate goal of SEO is to generate revenue for the site, so the core indicators have two: transformation, traffic.
For example, traffic can be subdivided: traffic = indexed * ranked * click Rate * Search Volume
Included can be subdivided: included = Crawl quantity * Page quality
In the case of grabbing, its bottleneck may be the top of the crawl time, the total number of crawls may be capped, and the total page may be capped. There are different ways of dealing with various situations, and different data indexes should be added.
Through such a division, the final data indicators (transformation, flow) can be subdivided into several Baisi data indicators, these data indicators are useful, because their numerical changes will eventually affect the transformation or flow.
(There is a popular SEO books, spent a lot of space to write a website analysis and operation of the example, but the last site SEO traffic is very low, did not see much effect. Because although there is a lot of "analysis", but more is in the view of meaningless data. )
When a major data indicator fluctuates (the most common is the total flow change), it is necessary to analyze the specific causes by subdividing the data, followed by two examples of the approximate ideas and steps.
The facets involved are very wide-
First of all to understand the basic principles of search engines, which can know in what circumstances what data indicators are relevant;
After the need for a variety of technology entry-level knowledge, because the various types of data acquisition methods are different, some need to collect, some needs to extract from the log, some from the Data Warehouse export, some from the API, etc., need to learn the direction of the same;
Data alone is just a number and needs to be analyzed to make it worthwhile. Temporary analysis of the general use of Excel, monitoring class analysis needs to do their own to output a chart of the report system.
It will take at least a half a year to learn, but not difficult, nothing to fear. As learning 1+1 will feel 9*9 is difficult, but after learning to look back, then nothing.
A hypothetical case study based on
Consider an example that is closer to the actual application:
A Game class forum, a plate for the network disk game download, many pages have to x network disk export links; b plate for BT download, the page has a site within the seed download link, no outbound export links.
Start Analysis:
The number of pages crawled and the number of real-time indexed, and after calculation, found that the page a plate quality is significantly lower than B. (Page quality = included quantity/fetching quantity)
Then guess at this point, whether is a plate post page, x network disk Export link caused its low quality of the page?
In order to verify the speculation, and then the a section of the post page into two groups, of which the AA group has an X-disk export links, BB Group does not have an X-disk export links. calculates its page quality separately. If the page quality of AA group posts is significantly lower than that of BB Group, we can draw a preliminary conclusion:
x network disk export links, so that the quality of their own posts to reduce the impact on the volume, and ultimately lead to SEO traffic damage. So you need to export this type of link to do special processing, such as from their own station url,301 jump to the x network disk and so on.
Of course, this conclusion may not be correct. For example, the AA group of posts have characteristics, in addition to the X-disk export links, but also may have characteristics such as reprinted posts more (game resources paste many will be reprinted), may be the latter really led to the low quality of its pages.
Even more rigorous data analysis (such as grouping reprinted posts with non-reprinted posts) is more likely to draw accurate conclusions, but so rigorous down there is no head, SEO is not rocket science after all, in a certain step of data acquisition is very difficult, with experience to guess under may more row.
So you can first implement the project, and then monitor the entire a plate relative to the B plate quality of the page is closer, so that the final results to confirm the correct guess before.
Although this example is entirely hypothetical, I am not sure if the actual situation, the post to the network of the export link will affect the quality of the page. But similar examples can encounter a lot on weekdays, and this kind of analysis of the data required to collect the convenience, so this is the daily SEO data analysis of the most easily encountered a kind of situation.
An advanced analysis of the actual case
(numbers differ from actual values but are roughly the same)
The flow of internal pages in a certain time period of a significant decline, it is necessary to analyze the reasons for the decline.
A common and useful idea is to take 25% of the hot words and see the flow of them and the changes to the other 75% words respectively. If the resulting data is that the flow of hot words is much lower, that is, most of the heat word ranking has declined, if the degree of decline is equal to the overall effect.
First, 25% of the popular keywords are calculated, their flow rate is 35%, and 75% of the non-hot words, their flow rate of decline is 30%.
But although the data is 5%, the gap is not particularly clear and needs further analysis to determine.
Because the keyword is often can be further participle, such as "SEO article" can be further divided into two search words-"seo", "article", and search the word is really in the processing. So further analysis will be to analyze the 25% of the popular search word traffic changes.
The final data was that the flow of keywords containing 25% popular search terms dropped by 40%, and words that did not contain popular search words dropped by only 10%.
So the conclusion is clear, the flow of popular search terms has changed. After comparing the site of the recent project on the list, not related to this part of the modification, so this is the algorithm of Baidu's adjustment caused.
Know the flow of the cause of change, nature can also have some corresponding improvement program. Although the program can be implemented or not depends on the site's emphasis on SEO, but at least without the pretext of: "Baidu always like K our station" and so on.
Author: Zero non-brick home, a line to do SEO, every day Code code personal blog: technical field http://tech-field.org/