Absrtact: In the past webmaster or the operators of large web site data analysis may be very headache, one of the reasons is that the past statistical site data tools are not so powerful intelligence, we can only from a large number of raw data decomposition to filter out useful data
In the past webmaster or the operators of large web site data analysis may be very headache, one reason is that the past statistics of Web site data tools are not so powerful intelligent, we can only from a large number of raw data decomposition to filter out useful data, but now the site statistics tools are very good, according to statistical commonly used functions, The subdivision of the data to the webmaster and operators, but Chen Nian that is because these intelligent tools to our analysis of the format, fixed in a pattern, so that we can not expand the thinking and thinking, can not dig deeper the potential value of the site.
Conventional data analysis is nothing more than the collation of data, data modularity, inductive data, summary conclusions, etc., but I would like to say a more comprehensive approach, of course, more comprehensive means to pay more time and energy. This method is the whole, scattered, set, combined. Seemingly simple four words, but really want to use this method to dig deeper data, but also need rigorous analytical thinking.
< integer > As with conventional analysis, the first step is still to organize the data, and the scattered data will never tell you any valuable clues. Collation should be divided into two aspects, both the original data collation, but also you need to analyze the issues to organize.
Data collation: Data collation is our webmaster friends must master the basic skills, the most basic data are IP/PV, sources, keywords, such as the page entrance, jump rate, return rate, etc., if it is to do E-commerce site, but also according to the content of the site data on the flow statistics to carry out the conversion rate of statistics. Do a good job of these basic data will be subdivided after the data, because only the subdivision of the data is conducive to the depth of the excavation, the vast number of data behind there are countless ways to subdivide the data after the breakdown of each data module is very small, but more targeted.
Problem collation: If you have identified the problem before you analyze this data, that's a good thing to do, but if you just want a data analysis to achieve some kind of expectation, such as a 20% increase in the number of visits and a 5% increase in sales, you can't analyze any useful method just to analyze it. This expectation must be translated into clear questions, such as how much more traffic I can add, how many flows it can increase, and where the short boards on the Web site are being sold, and how much of the conversion has been reduced. Once you've identified three or so clear questions, you can start data analysis.
< scatter > Here the divergence refers to divergence, regardless of the problem or the data need to be divergent, for the divergence of the problem is a bit like brainstorming, both to subdivide the problem, but also to diffuse the problem. Three questions can emit more than 10 or even dozens of questions that may be subtle or not, but if you can filter out 3-5 small, useful questions in these questions, your divergent questions will not be futile. Data divergence is even scarier, and the scary thing is that it never ends like a black hole. A complete set of figures is like a tree, the subdivision of the data is the thin end of its root, and through these data, you can even analyze the nutrients of each piece of soil, but for data mining to have a limit, can not indefinitely dig down, this is "set."
< set > for mining data, novice may have dug up to the third level has no way of thinking in the excavation, for the veteran may be to the 78th floor can not stop, no matter what you belong to, when found that their thinking some messy time to stop, when the use of "set" This means will be enlightened. The simple word "set" is to integrate your breakdown with the data that you emit, find out about each other, but don't jump to conclusions, because such a conclusion will limit your thinking, display all the possible data in the problem, as long as there is a little connection, ready for the last step.
< > when the first three steps are done, the most fulfilling is the "hop". Turn the data reflected by the problem into a conclusion. Start small to sum up the conclusion, in the small conclusion with the data summed up the final conclusion, and then all the enlightened, you can find a solution, or write a high-quality data analysis report.
This is just a little thought of me, according to the whole, scattered, set, combined these four steps to analyze the data, from as to how the real depth of data mining, according to a different person each individual may have their own different ways, to rationalize the use, integration of these methods. Here only as a reference to a method of thinking.