I often hear entrepreneurs say that their company produces/records a lot of data every day, although they haven't figured out how to use the data for the time being, but they're saving it all. They often say that through these data their products/services will be greatly improved, as if the data is the company's savior. I don't want to talk about the right thing, but I want to explain two common misconceptions about big data here:
Data is not equal to information
People often use data and information as synonyms. In fact, the data refers to a raw data point (whether through numbers, text, pictures or video, etc.), the information is directly linked to the content, need to have information (informative). The more data, not necessarily can represent the more information, more can represent the information will be increased in proportion. Let's look at two simple examples:
Backup。 Many people are now regularly backing up their hard drives. This is not a lot to explain, each backup will create a new set of data, but the information does not increase.
Information on multiple social networking sites. Many of us are active on a number of social networking sites, and the more we have on social networking sites, the more data we get, and the more information we get, but not proportionately. Not only are we forwarding friends ' tweets (or content on other social networking sites) to each other, but also because many of them are very similar, and some of them are very similar to each other, although the text is different.
Second, the information is not equal to Wisdom (Insight)
Well, now that we've gone over all the duplicated parts of the data, and we've consolidated the same data, now that we have all the information, is it going to be useful for us? Not necessarily, information should be converted into wisdom, at least to meet the three criteria:
can be deciphered. This may be a big data age-specific problem, and more and more companies are producing a lot of data every day, but they haven't figured out how to use it, so they store it temporarily unstructured (unstructured). The unstructured data is not necessarily deciphered. For example, you have recorded a customer on your site three times the interval: 3 seconds, 2 seconds, 17 seconds, but forgot to mark what this three time in the end represents what, these data is information (not repeatable), but not decipher, so it is impossible to become wise.
Relevance. We have explained the importance of relevance. There is no more detail here, nothing more than noise.
Novelty. This is similar to the example of the social networking site I Wenju, but the novelty of this is often not judged by the data and information we have. For example, an E-commerce company, through a set of data/information, analyzes the customer's willingness to pay 10 yuan for the product delivered for the day, and then gets the same content through another set of completely independent data/information, in which case the latter is not novel. Unfortunately, most of the time, we can only judge the novelty of a large amount of data and information.
To say so much, is to express, in fact, we do not have the useful data we think so much-the big data itself is a gimmick. In today's era, an average start-up can produce more than 1GB of data a day, and a slightly larger company produces more TB of data every day. But before spending money on big data analysis, we need to be aware that data does not represent information or wisdom.