SEO Practice (1)--The data preparation before SEO

Source: Internet
Author: User
Tags continue expression include log mongodb php file regular expression


Looking back to find Semwatch has not been updated for a long time, although the blog traffic, but as a non-profit group Bo, when it gives the real need of people a little practical useful articles, that is enough. As a member of the editor, I think it is necessary to keep such a spirit of their own meager power to continue.



When we start a SEO job, the first thing to do is to make sure that everything we do can be supported by data-not our intuition. SEO's main data source from two: Web site's server log, Third-party traffic analysis tools.



Web server Log



The built-in log configuration format for common servers such as Apache,nginx combine has been able to meet most SEO analysis requirements. It looks something like this:



111.111.111.111–-"[20/feb/2012:18:09:25 +0800]" "get/http/1.1″200 3121" http://semwatch.org/"" mozilla/5.0 (Compatib Le googlebot/2.1; +http://www.google.com/bot.html) "



Information that must be logged, such as the access source IP, access time, access page, HTTP response status code, access source, and client identity, are available in the Combine log format.



To ensure that the server logs meet the analysis requirements of other departments, at least make sure that the above mentioned items are recorded in the server log. But do not record any data can be recorded, only the actual need to select the part, otherwise it will make the site log volume is very large, not conducive to the analysis of efficiency. The content may need to be communicated with the Operation dimension.



Then on the analysis of the log, I think that there is not too much fixed preparation to do, because its data source is raw (the raw seems to sound more feeling?), so the selectable data dimension is almost limitless. Therefore, in particular, according to the actual needs of the corresponding processing and analysis.



For some requirements that are not particularly high log analysis requirements, you can try to use light-years log analysis system. Although I personally do not favor all of the graphical interface's utility programs, it provides some good data dimension ideas.



I heard that a large tourism site is using MongoDB combined with Map/reduce for log analysis, I have also used MongoDB to achieve the previous mentioned light years log analysis of some of the important functions. So feeling MongoDB is a choice to consider.



Third-party flow analysis tools



installation of Google analytics



For free flow analysis tools, Google Analytics is definitely the leader (hereinafter referred to as GA). However, if the site's monthly browsing volume is greater than 500W, only Google AdWords users can continue to use the GA for free flow record and analysis. Let's take it as an example.



After GA adds a site that needs to track traffic, it prompts you to add a JavaScript code to each of the tags you need to track the page. Adding code can be an easy task, but it can be cumbersome, depending on the template layer on your site.



First to mention the common Open source Blog program WordPress method, it uses the template to deal with the way, such as the home page, list pages, article pages, such as their own templates, are only part of the. and including Web page logo and so on the head, all use the WordPress Get_header method to load another independent template file (Get_header method is essentially PHP inside the include function). In short, as long as you add code to the header.php file, all of the pages that contain it will be changed, and the GA code will soon be ready to be added.



But the situation is not always ideal, especially for Web sites that use the Web site framework for their own development, and sometimes do not include such a good use of the method. This may be the site of the construction of the standard imperfect relationship, but also may be the Web site requirements led to really can not use and WordPress similar to include the way. Then, at least in the head of each page, an extra chunk of the global JavaScript load is added to facilitate the addition of the overall JavaScript code.



Although it may not be possible to make changes to a potentially bad site template structure when adding GA code, it's up to dozens of different template files to add code separately (and, of course, take time to ensure that there are no missing pages). But a one-time fix on some essential issues can be a lot of future convenience-for example, another set of statistical code.



Perhaps the most troublesome thing is how to convince programmers to modify the template structure for some seemingly small demand, which is skipped here.



  Some basics of Google Analytics settings



For SEO, one of the most basic settings, is to the site on the SEO valuable pages to classify. To distinguish the page, and to master their current flow status and trends, to grasp the focus of SEO, and better analysis of the site every time the effect of SEO changes, and so on.



As the simplest example, for a Web site, if you have 1000 outside the chain, should give the site's column page or product page? This mainly depends on which type of page has a higher conversion rate and greater SEO flow to improve space.



For each site, there are different situations. For example, a book-type electronic Business site, it list page will not have too much traffic, not many people search what "computer books", but more people search "CHOBSH autobiography" and the like, because the user has a very clear demand. And for a clothing dealer, the corresponding more people will search for "shirts" and so on, rather than "2012 Spring new white shirt", etc., because users just think of the site to pick clothes, they only need the intention, but the specific needs are vague.



The above two are typical examples, but there are more cases where we can't make accurate judgments with our intuition, so we need to use traffic data to collect facts.



Although the blog's Traffic data analysis does not have much value, excellent article is the blog everything, but here or semwatch as an example to briefly introduce the method. Let's say we need to differentiate between the Semwatch column page and the article page traffic, which is similar to the URL:/category/seo/,/2012/02/post/



To get to the GA data page, find the Advanced subdivision, and click on the new custom subdivision on the right. Then make the settings similar to the following figure:









Typically, you can differentiate the URL of a page after it matches the corresponding regular. Note that if the initial URL planning of the site is imperfect, it may lead to a very, very bad situation where the URL is not used to distinguish between the page types, and it is important to ensure that each type of page has its own URL identity.



In this example, the Semwatch column page matching regular expression is: ^/category/.*?/$, the article page is: ^/2[0-9]{3}/[0-9]{2}/.*?/$



Try to use the most stringent regular expressions, so that you can avoid a lot of unnecessary confusion. It's also important to note that the old version of GA uses regular expressions by default, and the new GA must select "Match regular expression".



As for regular expressions, space is impossible to explain, if you do not understand, you can consider looking for a programmer for help. But my personal advice is to try to master it as much as possible, this is a relatively basic technical requirements, SEO should not be baffled by it. Regular expressions are disgusting--at least I've never read my own writing, but it's easy to learn.



In short, through the above steps, we simply distinguish the type of page. Back to the original example, if there are 1000 outside the chain to Semwatch random allocation, now should be the chain to give what pages? can be found that the column page is almost no traffic, and the article page is naturally high flow. In most cases this proves that the article page has more traffic development space, at this time, the outer chain assigned to the article page is the most sensible approach. (but also can not arbitrarily say, can not exclude the column page SEO has the possibility of a huge problem, this problem is not uncommon. So we have to combine our common sense and other aspects of the analysis to make a comprehensive judgment. )



The limit of space is over. Another about Google Analytics experience in the semwatch above have more to share, we can use the search function.



  The final summary



There are a lot of problems that might actually be faced, and of course it can't be covered by an article. Mentioned above is just two main data, the actual SEO process, but also may need to use the data such as site-level Google webmaster Tool, estimated flow of Love station, Semrush, Google Adplanner, HitWise, keyword Google Keyword Tool, Baidu Sinan, Link class Majesticseo, ahrefs and so on.



Recently, I was looking at the method, referring to: "Based on the facts, strict structure, to assume-oriented", similar to a slightly summed up SEO words: "Based on data, strict logic, to effect as the goal, technology as a means." This article is for the foundation of the data to cushion the basis, it is no value in itself-just look at the data, it is only a rigid figure.



How to borrow data from the auxiliary, in the most needed place to make changes in the SEO, so that the flow to achieve a big breakthrough and to create value for the site, this is the part of our real concern, and then slowly decompose.



p.s. I usually write articles more casually, the style is scattered, the statement is not clear, the center is not clear, but if you think this is acceptable, may wish to also look at my personal blog: http://tech-field.org/. Of course, this series of articles only in Semwatch serial, can not in turn rob it traffic.




Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.