The distance of an SEO data analysis has been a long time, recently felt should write some actual point of content to see how SEO in the end how to do. First clear some basic points, a Web page is included or not, there are two factors
Have you ever been crawled by a reptile?
Whether the page quality clearance
The previous article has mentioned the rate of such an index, a lot of sites are lazy to do this indicator, "I look at the site data not on the line!", in fact, there is no such indicator, a lot of work can not start. Identify problems from the data, use data to guide the solution, and analyze data validation work results. Recently read the "Simple data Analysis," This book, Feel Good, the method of data analysis is very vivid, suggesting interested students engaged in data analysis can buy a look. Any data analysis is composed of four elements,-> analysis-> evaluation-> decision.
Goal: We want to see how the site is included, in the SEO aspect whether there are opportunities to improve.
Analysis: The inclusion of what is good what is bad, is not measured by some indicators? The site is not too general collection, should be subdivided under the various pages of the collection situation?
Evaluation: So we need some data below
Page-level relationships for Web sites
Each level of the page to bring the SEO flow
How to collect each level page
The proportion of SEO traffic can be filtered out from Google Analytics.
The number of pages can be obtained from the database, or by the locomotive or self-made small script crawl statistics.
The collection rate can be obtained from the page through the tool to search, the locomotive can also.
This side of the Ads Zero tool: http://www.gnbase.com/forum.php?mod=viewthread&tid=11468&highlight=%CA%D5%C2%BC%B2%E9%D1% Af
The problem immediately highlights!
1+2 level Directory page has brought a lot of traffic, the rate is not very good, optimize the flow of the upgrade included in the breach!
The number of product pages, included is not very ideal, but the flow of limited, in addition to the inclusion of problems, as well as the content of the page, this article first ignore it.
Decision: Our conclusion is to immediately start the operation of the catalog page to be included in the optimization.
See this side, seems to be the beginning of the goal: "Through the optimization of the collection of increased traffic"
Evolved into a new goal: "How to increase the volume of catalog pages"
This side can again through the method of data analysis to SEO it?
The answer is YES!
We're going to go again. Objective-> Analysis-> the process of evaluating-> decision
Goal: To increase the number of catalog pages included
Analysis: The beginning of this article on the inclusion of the two factors, we need to check whether the Web crawler crawling, the quality of the page is not clearance.
1. In the case of reptiles, we need to analyze the log in order to determine. So we split a series of data from the log to see if the page was actually crawling.
2. Since page quality seems to be a difficult value to measure, we can use the same template:
Number of pages that have been crawled/crawled and included
To evaluate the impact size of the template page quality. If the crawled pages are included, that at least shows the content of this page search engine is also recognized. (The actual situation is far more complex than this, and may be included after the quality of the problem is deleted, but always better than any reference is not better, right!)
Evaluation: (sensitive information is replaced by numbers, all real data)
First look at the crawler log, through the shell script, we can analyze.
The total number of directories has been crawled by about 13,000 times
Non-duplicate directory crawl number is about 5,500 times
The directory under Channel A is almost 100% crawled over at least 1 times, and Channel B's directory crawl is good, with 70% being caught at least once.
Less than 30% of the directories under the remaining channels are crawled
Do not think this result is amazing, in fact, many sites will face such a bad problem, as long as you keep the data subdivision, subdivision, and then subdivide, will always observe some clues.
On the log analysis, do not superstitious any log analysis software, which is for lazy people use, homemade script +excel is the king, you can split the display of any data you want, of course, can even Excel do not.
Then, we counted the most frequently caught channel A and Channel B, catalog page collection rate
Channel A and B is very reassuring, that the quality of the page is no problem, but the rest of the inclusion of the situation makes people more worried.
Decision: Through the above data evaluation, we have got the following conclusion.
Page quality is not a factor in the inclusion.
The a,b of the channel is unusually high, through the investigation to understand, originally is the directory page on the homepage, the display is full of channel a directory page, home also has the highest weight of the entire station. Channel B has a stronger external chain than other channels, and has a very high weight.
In addition to a, B channel, other channels of the crawl situation is not optimistic, the crawl entrance too little, too deep, and then affect the inclusion of the situation.
Obviously, now channel A is too powerful from the point of view of the station, and must do some "maxi" movement to reduce the amount of channel a, and transfer to other channels. At the same time, the crawler needs to provide more access to crawl channel pages.
Now that the problem became clear, we began to divide the work into two parts: 1. Provide more entrance 2. Divide resources across channels rather than on a few channels.
Provide entrance work:
1. Make the URL of the catalogue page into a sitemap. Submitted to the search engine, and set it to a relatively high crawl weight.
2. Improve breadcrumbs navigation, make breadcrumbs more detailed navigation to provide more access
3. Recommended catalog pages in other products
Sharing of resources: (Some concepts: any page may become a crawler portal, Baidu Crawler crawl depth is limited, the page relative to the entrance of the shallow, the probability of being crawled higher. )
1. The original home point is a directory page of channel A + product page, all of its nofollow, to ensure that from the first page into the crawler, all crawled to the channel page, through the channel page and then enter the catalog page (in fact, this is not too important)
2. The original channel page points to its own product page, nofollow it all (make sure the crawler from the channel page entry, the maximum Crawl directory page)
3. Return the link from the table of Contents page to the homepage, all its nofollow.
4. Reduce some irrelevant links on the page. (Under what circumstances it is very effective to do so.) )
Now it's time to start.
Results
What is the effect of doing that, let's take a look at the data after 1 months of modification.
Catalog page rate increased by 100%!
Product Page Collection rate also has a certain degree of improvement, this is due to the catalog page for the good display of products.
The SEO performance of the catalog page:
SEO traffic accounted for up 15%
The number of visited keywords increased by 10% (included in the new page)
SEO traffic increased by more than 50%. (including some seasonal factors)
Note:
1. In addition to the inclusion, ranking is also a problem, you can sync attention.
2. For a special case of channel A, it can even be full-screen, but the technical implementation will be slightly troublesome.
3. Baidu's support for the nofollow is said to be very confusing, people who know Baidu insiders can help ask.
4. Have questions to leave a message ~
Author: Night Interest http://www.imyexi.com/?p=575 welcome reprint, please keep the source