Hadoop limitations and data diversity make data scientists mad

Source: Internet
Author: User
Keywords Data scientists complex respondents expressed limited

Corporate users are increasingly focusing on creating large data-analysis capabilities, while http://www.aliyun.com/zixun/aggregation/13768.html "> data scientists are under even more pressure."

In a survey of more than 100 data scientists, released this week by Paradigm4, the founder of the Open Source Computing database management system SCIDB, they found that 71% of the respondents surveyed said that as the type of data source and the size of the data grew, The difficulty of their work also gradually climbed.

It is noteworthy that only 48% per cent of respondents in the survey said they had used Hadoop or spark in their work, and 76% per cent thought that Hadoop was performing too slowly, requiring a lot of effort or other serious limitations in setting up the plan.

"The growing variety of data sources is forcing data scientists to find a way to deal with problems, otherwise the conflict between the data volume and the budget will become irreconcilable," said PARADIGM4 CEO Marilyn Matz. "The current focus on data size masks the real challenge of analytical work. Only by addressing the major challenges of using different types of data can we release the enormous potential of analytical tools. ”

Even with the many challenging factors that surround the Hadoop platform, it is still unsatisfactory. About half of those surveyed said (49%) that they found their data difficult to adapt to relational database tables. 59% per cent of respondents pointed out that their businesses had begun to use complex analytical mechanisms, such as mathematical methods such as covariance analysis, clustering, machine learning, principal component analysis and graphical operations, rather than "basic analysis", such as business intelligence reports, to analyze business data.

Another 15% of respondents plan to start using complex analysis mechanisms in the coming year, while 16% of respondents set the introduction of complex analysis mechanisms into the next two years. Only 4% of respondents said their companies had no plans to use complex analytics.

Paradigm4 that this means that big data, the "handy value fruit", has begun to turn into real profits, and that data scientists will need to delve further to maximize its added value.

"The transition from simple to complex analysis in the development of large data indicates that the analytical mechanism will gradually move to a large-scale road, this process will go beyond the single server memory limit, focus on dispersed and easily overlooked values, and need to be backed by the right mix of sampling frequencies-all of which will be emerging requirements in the analytics field , "Paradigm4 wrote in the report. "These complex analytical methods also give data scientists a number of unregulated and hypothetical practical solutions, and ultimately allow the data itself to be able to draw conclusions." ”

Sometimes it's not enough to rely on Hadoop alone.

PARADIGM4 also believes that Hadoop has been unrealistically exaggerated into a universal and disruptive large data solution. The report notes that Hadoop is simply not a viable solution for certain complex analysis cases. PARADIGM4 says the basic analysis has become a "highly parallel mechanism" (also known as the ' data parallelism '), which is not the case for complex analysis.

The so-called highly parallel problem can be split into multiple independent child problems and can be run in parallel-there is almost no correlation between different tasks, so people do not need to access all the data content. This is the way Hadoop mapreduce the data. Rather than a highly parallel class analysis task, such as many complex analysis issues, requires a one-time use and sharing of all the data content and in the process of the results of communication at any time.

A 22% of the data scientists surveyed said that Hadoop and spark were not suitable for their analysis. PARADIGM4 also found that 35% per cent of the data scientists surveyed had tried hadoop or spark, but eventually abandoned their intention to introduce them into the real business environment.

Paradigm4 in the report, 111 U.S. data scientists from the Innovation Research enterprise Innovation Enterprise from the 3721.html ">2014 year March 27 to April 23 survey groups." PARADIGM4 in the chart below summarizes all the relevant findings.

Original link: http://www.cio.com/article/2449814/big-data/data-scientists-frustrated-by-data-variety-find-hadoop-limiting.html

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.