Ask:
In actual work, I often find that no matter what argument you want to get, even if it is contradictory, in front of the mass of data, can justify, and through the data to support your point of view. So sometimes I often think that the data is not used to cheat people, the difference is, you are to use the data to fool yourself, or to use to fool others
Secretary:
Let's look at a simple example
Last week we saw a list of categories of Android apps in China, and we found that 22% of the apps with the most downloads were apps for AV images. So the question is: does this mean that the most promising categories of applications are AV images?
Yes, because so many of the same applications have been downloaded a special number of times, indicating that users need large;
No, because most users have developed specific habits of use, the market tends to saturation;
This time, how to interpret, how to understand, how to "cheat", all by the understanding of the industry.
You may need a lot of circumstantial evidence, such as capital market trends, dau of the same app, and so on.
In fact, even large data mining, there is a preset.
This is determined by the workflow that produces the report.
After accepting a data analysis project, the first thing to do is to find the right data source and field.
Then it goes into the process of data acquisition, cleaning, and mining analysis.
Finally, to our hands is a pile of tables. How to interpret, varies from person to person.
It is not difficult to find that a report of the two-part, people, are the main factors affecting. Therefore, it is necessary to have their own judgments and ways of thinking.
But is it a lie to "push the data from the argument" to "justify itself"?
Fact
For data, we have two very typical usage scenarios:
1, the customer threw the data directly, the table is ready-made, looking for bugs from inside;
2, the customer thinks I am not comfortable, but do not know where is ill, you that data show me.
Two different scenes
Scene one, anomaly data, nothing more than a variety of and mean, and curve alignment. After the results, you need to read.
I today's dau down 20%, this number, you see, the customer also saw that the customer is looking for you to know why. So you need to interpret ... The so-called interpretation, we must first make assumptions, and then trace to find data validation assumptions, and gradually found the reason.
Scenario two is to first make the original assumptions, and then do data mining, validation or falsification, and then assume, then dig, re-test or falsification ...
In fact, this is why the so-called "big data age", analysts seem so important.
If the data really can say "human words", that everyone can understand understand, but also analyst why AH.
We need analysts to "make the data Talk".
Then how to "compile" the data "adult", this process, the mixed too much "subjective initiative" in it.
As long as the data is not compiled, no matter how it is interpreted, it is a methodological problem, not a fake one.
So, the personal point of view is that there is no need to tangle is the data inference point, or the argument pushes the data.
Data is nothing more than validation and falsification, as long as the data itself is true.