Data analysis, should not be too high

Source: Internet
Author: User
Keywords Data mining big data era
Tags .mall analysis analysts based basic behavior big data big data era

Data torturing is the foundation of products and operations, and data mining has become the core of every major company today. Ma said that using big data to listen to the user's heartbeat. Data, has become one of the hot spots of this era.

There are countless examples of the Big Data era. But in many cases, this big data product is far away from us, and the amount of data is not that big. Perhaps the most important aspect of product and product operations at the moment is to analyze and operate the small data around them and not to be killed by the data and seek ways to increase the retention rate and conversion rate.

Recently Cao Zheng wrote a data torture article, which stems from his participation in the creation of unified statistics, cnzz webmasters statistics, and has presided over the establishment of Baidu commercial analysis support platform, is the chief architect of Xiamen 4399 games.

He has a data torture three ax, a school will, one with the spirit, the common data scenarios, most of the daily needs can be solved. "Contrast, subdivision, traceability" mantra. You can try cutting cut see.

========= Small data mining division line:

Text / @ caoz

First of all, according to the traditional definition, I really is not a master data analysis, a variety of associated algorithms, only the simplest (say a lot of occasions still work); various mining techniques, basically know nothing about; various Niubi data analysis tools, in addition to the simplest of several free statistical platform, basically one will not be used. Therefore, all kinds of expert high master please free BS, or ignore. Here to say some experts do not say.

Speaking from microblogging Duanzi, microblogging on the data analysis has two segments, I often speak as a case, the first paragraph, that an investor is interested in a company's industry, to do a background check, a technology Flow, a week analysis of various online data, looking for industry materials around, stay up late every day, and finally write a report; B is the flow of people, and the other executives drank wine, the other core staff ate the meal, all the insider data The second paragraph, an e-commerce found rival Taobao shop, weekly revenue dropped suddenly 30%, but after a natural recovery every other week, there is no other anomalies in the middle, so the boss Analysts analyze hard-pressed analysts hard for a few days, to do all kinds of mathematical models, finally found barely justified, the boss read completed, though not convincing, but there is no more reasonable explanation, one day, see the opponent Boss, chatting about the matter, "How do you suddenly a decline in revenue some time?" "Hey, do not mention, the wife died, went home funeral, the company sheep." The boss suddenly realized.

Two paragraphs, the first paragraph, microblogging on the one hand, said the hard-pressed analysis of no contacts useful; the second paragraph is similar, one-sided that the news of the network than hard-pressed to analyze and use more. But what I want to say is that this interpretation is absolutely wrong!

Let me talk about the first paragraph, in fact, there is no lack of networks such "network of people", especially the media circle, some so-called "IT name" or "famous critic, analyst," and various Internet Gangster brotherhood, Constantly, but? They never study the product, do not analyze the user, so they know the data, but do not understand what is behind the data, but do not know what is important, what is secondary, I sometimes criticize this Friends, do not think they know the news of several Internet Gangster every day, when I was a veteran insiders, just because they have these things to show off, just ignore the real valuable information and valuable data. This is why mixed network media, have seen all kinds of people in the market, in the wave of Internet entrepreneurship, there is almost no real reason for the chance of success, self-righteous, omniscient, in fact, is due to the lack of basic data background analysis , So what is it seems to understand everything, in fact, do not understand anything. Please keep in mind that unless you are a rich second generation, the second generation of officials, with a golden key, it is not in the scope of my discussion, otherwise, without hard experience, there is no Niubi achievements.

I often subscribe to some well-known analyst's Weibo, the data they disclose are often very valuable (this is the reason I subscribe), but their interpretation is usually appalling, and this is just look at the consequences of the phenomenon, and casually turn Look at their interpretation of the data can be said that their sense of data and data poor understanding of the lack of ridiculous, and even the lack of basic data verification and research capabilities, they got a company's core data and what happened? Analysis, they actually can not see anything.

The second paragraph empathy, if it is not continuous and effective data tracking, how to draw a 30% reduction conclusion, the data conclusion and the information obtained from the network mutual authentication, will get complete and true results, or just chatting, How can you know each other's business management impact on the performance of areas, hard-pressed analysis may be a moment without the message of control, but you get the data recognition and accumulation of contacts will never give you.

So, again, basic data tracking and daily data nurturing are by no means ignored and ignored. Information can become an important source of information for data interpretation, but must not be overwhelming, alternative to basic data analysis.

Here to talk about the sense of data, what is the sense of data? Is someone say a data out, you will pondering whether this is in line with common sense, and your daily data observation experience is consistent, if inconsistent, then the possible reasons? For example, 12,306 If you have a sense of data, you will first question the rationality of this definition of "click". For example, someone once said that there are hundreds of millions of visits a day in a domestic image sharing website. At first glance, The definition of "traffic" is ambiguous (after the official explanation is the picture load, this and the traffic difference of dozens of times.) Data sense needs constant training, and the basic logic (for example, you should know how many netizens in China every day How many people are online, what is the approximate type, what is the percentage of Internet users covered, and how to make good use of various tools. I used to be a giant company and benefited from the enormous data resources of the company. I can see a lot of Internet Of the core data; but after leaving, only to find that in fact, publicly available data on the Internet is More often, and make good use of the words are very effective. Go to some interesting data every day, after a period of accumulation, want to have no sense of data are difficult.

As a company or team leader, how to cultivate staff's sense of data, I actually have a suggestion, usually can engage in some small quiz, such as the team quiz new product or product revision on-line after the daily active users, or pv figures, or income Data, and so on; then see who the most accurate, one is the punishment system, the most inaccurate please drink the most accurate tea, eat ice cream; another non-punishment, the most accurate cumulative points after the company can send some prizes to encourage , So down everyone's sense of data will be cultivated in daily, but also to help the atmosphere of the team.

After the data sense, the method of data analysis, my advice is not show off technology, not demanding technical complexity, the simplest data, the information contained is often the most valuable, and many people did not do this step Well, we always think of a bunch of mining algorithms; the value of the data lies in the correct interpretation, rather than the complexity of the algorithm, must not be overwhelming. Large companies kpi system, often deviation, such as the assessment of technical engineers, to pay attention to "technical complexity", "technology leadership", a direct result of simple things no one willing to do, the most basic work not to do so! Often the analysis of large companies, engineers, in order to appraise senior engineers, complex problems need to be complicated, the four operations to get things must get a weird algorithm, and ultimately not only a waste of resources, consume time, and often due to the engineer Neglect of business understanding, the corresponding product staff again alien to the algorithm, led to a serious understanding of ambiguity, resulting in a variety of misunderstandings.

Here to say the key, the data interpretation, the correct interpretation of the data, all data analysis is the most crucial step in this step is wrong, all the efforts in front are unlucky, and then, often, many people simply think that "the data will speak," they I think it's ok to process the data, so I see many well-known analysts holding the correct data, but what's more, it's obviously intentional behavior, a very, very famous and very good word of mouth. Multinational corporations have made different interpretations of the same cool data on different occasions for the needs of the market's public relations; this is simply a moral issue.

Data interpretation can not be done to cater to who, to follow the nature of the data, to follow the logic of science, to have imagination (with the confirmation), may sometimes also need to rely on the information obtained by network connections (there are many typical examples) , This specific no matter how maybe I also can not say clearly that a few negative examples may be easier to understand.

1, causal association error, or ignore the key factors, A and B are highly relevant data, some people think that A affects B, or B affects A; However, sometimes the real reason is that C affects both A and B, there When C is ignored.

2, ignoring the silence of the majority, especially online voting, investigation, easy to produce such a deviation, participants often have some common appeals, but not the participants are often the mainstream users.

3, the data is wrongly defined, or understand the ambiguity, in the technology and market, product information communication ambiguity, a direct result of the data processed and the required data deviations, the result is not correct.

4, forcibly matching; different companies, different areas of the data may be inconsistent in the definition of the same company or field comparison, there is often no problem, we are all used to this, there are critics do not understand, forced to be different The definition of the data put together to make a conclusion, significant distortion; well-known overseas financial institutions in the analysis of China's page-tour and the terminal travel market to make such mistakes in a row.

5, ignoring the premise; some data conclusion is based on a certain premise, in line with a particular scenario derived, but the reader intentionally or unintentionally ignore the premise, the conclusions of the magnification, significant misreading.

6, ignore the interaction; in the business model transformation and product improvement, often will be such a problem, the simplest said, the price of props in your game, the impact of income increase or decrease? If you ignore the interaction, relying solely on data projections, Of course, is reduced, but the actual? Do know the operation.

7, the lack of common sense; if some of the important anniversary, holiday, or online shopping do not understand, it is obviously unclear to deal with the data. This is even more true for industry reports, and it's hard to imagine what kind of report people in the industry can make.

8, ignoring the sample bias; we usually do data research is based on the sample data, and the sampling process itself is difficult to be completely fair and decentralized, the sample deviation should be controlled within a reasonable range, even if uncontrollable, the conclusions also need to label This is a rigorous reading of data, a blind eye to sample deviations, and even a deliberate search for deviations for some propaganda purposes.

Well, data processing has a little more to say. Although it is a technical living thing, some less technical things must be done. Many times I see a data that does not meet my expectations. My first reaction is that Understand the data source and processing logic, we usually face the data, including a lot of interference, noise data, and some identify easily prone to ambiguity or even misjudgment of the data, which are dealt with, often when engineers are concerned about the algorithm level, Efficiency level, do not want or do not care about these things, the conclusions drawn by the data distortion is very high, the more large companies, the more common this situation; in my effectiveness giants, such an example is very large, the treatment is actually Very simple, take a look at the source data, the correct identification of the middle of the noise and interference data labels, easy to misjudge the data on the second decision, all coolies, nothing technical content, but this is a must.

Finally, many people want to know how I view the data, or would like to ask me, they see a lot of data every day, do not know how to see, I actually have a very simple three ax, one will learn, one with the spirit, the common Data scenarios that solve most everyday needs. Simply put, "contrast, subdivision, traceability," the word mantra, gone.

In contrast, the data on there, is meaningless, you say your game churn rate 80%, what situation? I do not know, you asked me, I do not know. Know by comparison.

First, the ratio, you come up with 50 games to compare, others 90% of the average loss rate, you 80%, your game is not bad, others want an average loss of 65%, you 80%, there is a problem.

Second, the vertical ratio, and their own timeline, 1.0 version two months ago, the loss rate of 90%, you are now 80%, there are progress Mody, if you are 50% two months ago, now 80%, a good reflection myself.

Therefore, I particularly emphasize that in the interface of general enterprise data monitoring and displaying a large screen data, the comparison characteristics should be the largest, such as the proportion of all the more than a year-on-year decline in the overall red, The situation is clear at a glance.

Subdivision, the data appear abnormal, of course you want to know the reason, it needs to be subdivided.

Subdivide the first sub-dimension, then sub-granularity, what is the dimension? You in accordance with the time to divide, is the time dimension, in accordance with the geographical division, is the regional dimension, in accordance with the way to the point that is the way to the point, You are saying that today's website traffic rose 5%, ye do not know, you subdivided a look, most of the pages have not gone up, an activity page of a channel rose 300%, which is clear This is the simplest example of subdivision, in fact, many areas are common. What is the granularity of your time dimension, in accordance with the day, or in accordance with the hours? This is the difference in granularity, the dimension of your way, is the way to the site, or the url of the way, this is the difference in granularity; This will be the difference Value-level lock, find the reason.

Traceability, sometimes I contrast, the subdivision locked to a specific dimension, the specific size, and still no conclusion, how to do, traceability, according to the lock of this dimension and granularity as a search criteria, query the source log, source records, and then based on This analysis and reflection of the user's behavior, there are often surprising findings, we are based on this logic found some defects of the product, and you continue to analyze data in this way, the user's understanding of the behavior will gradually deepen.

In fact, there are many extensions to this topic. For example, how to see if a young man has the potential for data analysis and how to develop data analysis and product analysis talents, etc. However, that's it. Today, a lot of talk, I am a limited level, eating on the few moves, but also old and stupid, we all will be me not far from the laid-off, and you look Minato look.

Figure with Transformers dig in the tiger, you have your own "dig tiger"? Please leave a message.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.