If you're smart, you'll make a lot of mistakes. Data analysis Errors

Source: Internet
Author: User

If you don't know big data, you won't understandthe core value of big datahow big it is. Of course, not only do you need to understand big data, but also learn scientific data analysis methods to make big data produce value. And in the process of data analysis, intelligent data analysts will often make some mistakes, link linethe small part of the CRM to share with you these common mistakes, in the future application process as far as possible to avoid.

650) this.width=650; "Src=" Http://s4.51cto.com/wyfs02/M00/85/AF/wKioL1esPfKBvTfVAADgnppbC2E181.jpg-wh_500x0-wm_3 -wmp_4-s_2843121398.jpg "title=" again, you'll make a lot of mistakes. Data analysis Error "alt=" Wkiol1espfkbvtfvaadgnppbc2e181.jpg-wh_50 "/>


mistake correlation as causal correlation vs. causation

The classic ice cream sales are proportional to the number of swimmers drowning, which does not mean that an increase in ice-cream sales can lead to more people drowning, but only to the fact that they are both related, for example, because of the heat of the day. This example is more obvious, it may be said that some people think how to make such a mistake, but in real life, study, work, every now and then will make such a mistake.

give me a chestnut . :

The Lakers ' winning percentage was 71.5%when Kobe shot 10-19 , and the Lakers ' odds plummeted when Kobe shot 20-29, the data showed. 60.8%,and when Kobe scored more than once or more, the Lakers ' winning percentage was only 41.7%.

According to the data, Bryant should be less able to win the game. Not necessarily so. It's possible that Kobe Bryant is less likely to get a shot because his teammates are in good shape and doesn't need him to do too much. It's also possible that the team is ahead of the game and has too much rubbish. The game is too much because the game is difficult or the team is not good, need him to stand up. Of course, the above is only one of the possible, specifically what the situation depends on this group of data can not draw any conclusions.


-- statement: Non-Kobe powder, passers-by black.

survivor bias survivorship bias

The samples seen in the data analysis were observed in "surviving certain experiences", leading to incorrect conclusions.

For example, Bill Gates, Steve Jobs, Zuckerberg have not finished college, so everyone should drop out to start a business. The biggest problem with this conclusion is that those who drop out and have no success are often not seen. On the other hand, they quit because of the good, not drop out, see, Correlation / causality is really limited soul.

Uber , for example, found that new users had a coupon, but the average rating was only 3 . On the contrary, there is no coupon for the second time, but the evaluation is 4 stars and a half. This shows, do not give coupons user evaluation will be higher, sure enough, although the user love coupons, but the heart still feel cheap not good things? Obviously, the survivor bias in this example is reflected in those who hit a star two-star rating, after which there may not be a second time. More obviously, this example is my nonsense.

There's an essential difference between the sample and the whole.

To know for example, there will be an illusion everyone annual salary million,985/211 up, all kinds of gfsbfm, celestial income level to the Bay Area code workers. But on the one hand this is a survivor bias, which is known to be easier to see than the Greater V's voice (see, Survivor bias is also haunting). On the other hand, do not underestimate the difference between the Netizen and the celestial, and the difference between the celestial netizens and the people in the Celestial kingdom- - The difference between the sample and the whole.

A similar example is the working section of the water-wood, the income of pedestrian streets and the poverty line of Chinese websites.


The pursuit of statistically significant statistical significance

Statistics 101 tells us that to compare the two groups of numbers is different, the most basic point can see their differences are statistically significant.

Like whatLinkedinIt's going to be revised again (why should I say it again), there are two versionsAand theB.grayscale testing found that, compared with the existing version,AThe daily life is higher than the existing version20%, but the statistics are not significant. andBThe daily work and the existing version, although only high3%, but statistically significant. SoPMTake out statistics101turn to page two and say, come on, let's get a statistically significant versionBgo online. Data scientist with bitter forceDSsay, wait a minute! Not all the time to choose the statistically significant one, let's look at the versionAData Bar (specific analysis over 10,000 words).

Obviously, this example is also my nonsense.


No data visualization, and more frightening: making errors or misleading data visualizations

In the trend graph, in order to illustrate how the growth trend is obvious, the Y is adjusted to not start from 0 . This gap will look great and grow very large, but if you look at the Y - axis from 0 , there will be no gap.


(The next step is to arrange a Twitter example of 23333, because data analysis shows that there are Twitter This example of the company will be more interesting to read.)

Results and recommendations provided by data analysis are not feasible

Twitter analyzes text data to discover ...

Well, I can not make up, this shows that the results are not feasible although the "theoretical correct" analysis results, and then eggs ...


do not do data analysis

Do not laugh, according to the former school later everyone now do not know what is called PM said, this is true.


If you're smart, you'll make a lot of mistakes. Data analysis Errors

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.