Gurjeet SINGHT,AYASDI co-founder and CEO. Gigaom on the current large data technology, singht that the analysis of data from the query itself is a dead end, and pointed out that the current large data only completed the first step of March.
The following is the translation:
Many people will be shocked if they know that researchers are only analyzing and extracting insights from data collected from 1% of data. And that's 1% of the data being analyzed dominates innovation and insights, and now we call it "big data." of the 1 quintillion (百万3次) bytes of data collected every day, 99% of the data is not fully exploited.
We all know that the use of large data is very promising, but based on the many factors of the present, the effective use of data is still a bottleneck. In the drug research and development process, the data used more than the chemical process, the new energy exploration, the use of data more than geology, terrorist tracking, prevention fraud in the same way.
The problems we recognize today (above) and other global issues are the result of bottlenecks in data use. This has spawned huge investments in large numbers, and data jobs have also become the hottest jobs-data scientists, and the valuations of private data-analysis service providers have been pushed to $ billions of trillion. However, can you imagine the prospect of raising the analysis data from 1% to 100%?
Insights into existing data analysis
If you have a dataset as large as the human genome, how do you start? For example, Obama recently advocated the drawing of the human brain? To break through, we need to address the world's most complex problems, and we need to fundamentally change the way we get knowledge from the data. The first thing we need to think about here is:
Starting from a query must be a dead end: The query itself is not a problem. In fact, once you know what questions to ask, queries are critical. This is also the key: the original intention of the query was to find a pointer from a large amount of data, but they did not.
There is a cost to the data: most of the data is no longer expensive to store. And by using tools like Hadoop or redshift, even querying large amounts of data becomes very cost-effective. Of course, this is only from the hardware point of view.
Opinion is money: the only reason we are willing to pay for it is that the insights in the data can release value. Unfortunately, we have lost most of the value of the collected data. Although the cost of collecting data can be high, the cost of inefficient analysis is clearly higher. There is no tool available to extract insights directly from the data, we rely on very intelligent people to make assumptions, and then use our tools to validate (or negate) these conjectures. Because it relies on conjecture, there is a natural flaw in this pathway.
You already have enough data: there is a constant belief that "if we have enough data, we will definitely get what we want." "Too much time and energy is wasted on new data collection, and you can actually do more with the data in your hands." For example, AYASDI's recent insights into scientific reports, a 12-year-old breast cancer patient, have been analyzed for more than 10 years.
Big Data is just a start, not an end.
Query based analysis can do a lot of things in some ways, but obviously it doesn't meet the expectations of big data.
We often hear about breakthroughs in cancer research, energy exploration, drug discovery, and financial fraud detection, and how is it different from crime if the hype of "Big data bubbles" causes people to fail in data analysis for a variety of reasons?
So we need to give the data analysis higher expectations, we need to realize that the next generation solution must meet:
Experts in the field of authorization: The frequency of data scientists has been completely behind the needs of enterprises. This may be done to stop continuing to develop tools for them (data scientists) and, instead, to develop corresponding tools for business users (biologists, geologists, security analysts, etc.). They understand the circumstances of the problem more than anyone else, but may not be able to keep up with the latest technology or math.
Accelerated exploration: We need to get critical insights faster. It turns out that big data technology is not as fast as it is promised. If we continue to develop this way, we may never get the speed of key insights fast enough, because we will never be able to raise all the questions for all the data.
Human-machine integration: To gain insights faster, we need to invest more in machine intelligence. We need machines to take more responsibility for finding connections and relationships between data points, giving business users a better starting point to explore their insights. In fact, it is entirely feasible to solve these problems through algorithmic approach, and it is never possible for people to discover the salient features of large datasets. In a recent study, for example, searching the web search engine logs with algorithms has found previously unreported drug side effects.
Analyze various forms of data: Of course, researchers need to analyze structured and unstructured data. Also, we need to recognize the diversity of unstructured data: All languages, sounds, videos, and facial recognition documents.
When it comes to big data evolution, we're only in its infancy. Obviously if we continue to analyze 1% of the data, then we can only tap its 1% value. If we can analyze the other 99%, imagine that we can push the world forward in a variety of ways. We can accelerate economic growth, cure cancer and other incurable diseases, reduce terrorist attacks, and get tickets to other challenges.
(Responsible editor: The good of the Legacy)