For many market researchers, "medium data" is the analysis goal that truly provides ROI value returns. The so-called "Big data" analysis, will show a diminishing ROI.
The industry's skepticism about the concept of "big data" has never stopped, and many people think it is just an overly hyped marketing bubble. Indeed, most companies do not have PB-level data, such as Google or Facebook, in terms of volume of data. So, does big data have any meaning? Data analysis expert Tom Anderson recently gave a concept called "medium data", according to his division, the dataset data volume below 100,000 of the so-called "small data", the dataset in more than 10 million of the so-called "large data", and between the two is called "Medium" data. Tom Anderson believes that the rate of return on investment in data analysis is the highest in the "Medium" data range. Here's a blog from the IT manager network compiling Tom Anderson:
After I took part in this week's first big Data seminar for the American Marketing Association, I became more convinced that I had been communicating with marketers of many Fortune 1000 companies over the years. That is:
Few companies are able to analyze the magnitude of the so-called "big" data, and they don't really need it. In fact, most companies should start thinking about how to start with "medium" data.
Big data, big data, big data, people talking about it everywhere, actually, I found that there are few researchers who really deal with "big" data. I think we should narrow down the concept of "big data". Introduce a new, more meaningful noun: "medium" data to describe our current big data boom.
To understand what "medium" data is, and then to understand big data, we need to know what "small" data is.
"Small Data"
The above diagram simply divides the "big" and "small" of the data according to the size of the data record or the size of the sample.
Small data can include an interview from a qualitative study to the results of thousands of questionnaires. In this scale, qualitative analysis and quantitative analysis can be combined technically. Neither can be called the "big Data" now defined. At present, the definition of large data varies with the level of the enterprise's data processing. The usual large data definition refers to the amount of data that is difficult to analyze with existing generic software.
And this definition is from the point of view of it or software provider. It describes the inability of enterprises to take advantage of existing capabilities and the need for a large number of hardware software upgrades for valuable data analysis.
The data
So what is the medium data? Into the big data age, some data sets that we consider to be small data may quickly grow into large data. For example, the 30,000 to 50,000 user satisfaction survey records can be analyzed using an IBM-like SPSS software. However, the same analysis may be slowed down if these datasets are added to the text data such as user comments. The same dataset now takes longer to analyze and may even cause the analysis software to crash.
If we handle the same text data in text mining, the new data added to the dataset will greatly increase the amount of data. This is often considered large data and requires more powerful software to handle it. However, I think a more accurate description should be "medium" data, which is really just the start of real big data (which coincides with the idea of "big data needs anticipate, small start" before the IT Manager network). And the amount of data on this scale, in fact, there are many simple ways to deal with.
Large data
Well, we cut out a portion of the big data called "Medium" data. Now we can redefine the "big" data.
To understand the difference between "big" and "medium" data, we need to consider a number of different dimensions. Doug Laney, a Gartner analyst, once had a famous description of big data, dividing large data into 3 dimensions: Scale (Volume), species (produced) and speed (velocity), often called 3V models.
In understanding the difference between "medium" and "big" data, we only need to consider two factors, cost and value.
Costs (measured in time or measured by money) and expected value constitute the so-called return on investment (ROI). This can also be applied to the feasibility study of large data projects.
We know that some of the data is naturally more valuable than other data. (100 customer complaints may be more valuable to your operations analysis than the 1000 Weibo mentions of your product.) Of course, one thing is certain: there is no value in the data without analysis.
In contrast to the "medium" data, the dividing point between "big" data or "real big" data is that the input to the analysis is not attractive to the relative costs, including risks that may not be found. Large data analysis is either unrealistic or too low for the enterprise to have a larger amount of data than "medium" data.
and "Medium" data is just in the best range of data analysis, can be in a relatively controllable budget under the premise of a valuable analysis.
For many market researchers, "medium" data is an analytical goal that truly delivers value and has sufficient ROI. The real "big" data analysis, then, will present a diminishing ROI.
On a recent business trip to Germany, I was fortunate enough to meet a scientist who was working on a large collider project at CERN. Compared to the large nuclear collider, ordinary businesses do not need software and hardware like that to do large data analysis of that size. The collider's 150 million sensors produce 40 million data per second. In fact, even the CERN scientists are not going to analyze the magnitude of the data. They filtered out 99.999% of the particle collision data before analyzing it!
For our ordinary enterprises, the analysis of consumers is much simpler. For data or text mining, we do not need EB or PB-level processing capabilities or on thousands of servers running large concurrent software, there are actually some good software to deal with our general enterprise "medium" data requirements. When it comes to big data, the media often mentions Amazon, Google or Facebook. Even in these cases, which may sound more like the IT marketing hype, there is no mention of the amount of samples the companies actually use in data analysis.
As scientists at CERN have found, it is more important to be able to correctly analyze the important data that is relevant to the study than a brain dealing with all the data.
So, the reader might ask, "since the data is more attractive than the ' big ' data, why is it better to analyze ' small ' data?"
The key here is that as the volume of data increases, we can not only be more confident in the results of the analysis, but also may find some traditional "small" data can not find phenomena. For market analysis, this may mean discovering a new niche market or competitor's new trend, and for drug research, it may mean finding small segments of the population that are associated with high risk of certain cancers to save lives.
The "medium" data should be defined more clearly and require more best practices. Unfortunately, there are often business CEOs or CIOs who ask it to "collect all the data and analyze the data comprehensively". In this process, they are actually making real "big" data, which is often more than needed. This creates the ROI problem I've been asking. The pursuit of real "big" data often does not give you any advantage. Experienced "small" data or "medium" data analysts know that for "large" data analysis is often not satisfactory results. Compared with the cost of investment, from the perspective of ROI is not worthwhile.
Therefore, for "big" data analysis, "medium" data should be the goal we really need to aim at.