A graph that tells you whether you need SQL or Hadoop

Source: Internet
Author: User

Many of my friends asked if Hadoop was the right time to introduce our own projects and when to use SQL. When to use Hadoop, what are the tradeoffs between them? Aaron Cordova with a picture to answer your question, for different data scenarios, how to choose the right data storage processing tools to describe the specific description. Aaron Cordova is a big data analytics and architecture expert in the United States. Koverse CTO and co-founder.

The text on Twitter @merv forwarded a blog, "Statistics of triangles."
This is a blog about how to count triangles in a graph. The results of the MapReduce using Vertica and Hadoop are compared.

On top of 1.3 GB of data, Vertica is 22-40x times faster than Hadoop. And it only uses three rows of SQL.

Statistics show that the Vertica is simpler and faster on top of 1.3 GB data. But the results are not so interesting.
The results for the write task will be quite different-yes, SQL is really easy in this case. We all know that. SQL is much simpler than MapReduce. But in distributed computing, MapReduce is much simpler than SQL. And MapReduce can do things that SQL can't do. For example, image processing.
Use 1.3 GB of data as a benchmark for Vertica or Hadoop. It's like saying, "We're going to have a 50-metre race between Boeing 737 and DC10."

This kind of game doesn't even need to take off.

The comparison of the above blog is the same reason. These techniques are clearly not designed to handle this level of data set.
It is certainly better to have a scalable system, even if the small-scale data is still very fast, but this is not what this article discusses. Whether the performance results of large-scale data are still so obvious, this problem is not so obvious, it is really worth proving.
To help you how to choose what kind of technology based on their actual situation. I have drawn this flowchart:

Original link: http://aaroncordova.com/blog2/roncordova.com/2012/01/do-i-need-sql-or-hadoop-flowchart.html.

A graph that tells you whether you need SQL or Hadoop

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.