Many of my friends asked if Hadoop was the right time to introduce our own projects and when to use SQL. When to use Hadoop, what are the tradeoffs between them? Aaron Cordova with a picture to answer your question, for different data scenarios, how to choose the right data storage processing tools to describe the specific description. Aaron Cordova is a big data analytics and architecture expert in the United States. Koverse CTO and co-founder.
The text on Twitter @merv forwarded a blog, "Statistics of triangles."
This is a blog about how to count triangles in a graph. The results of the MapReduce using Vertica and Hadoop are compared.
On top of 1.3 GB of data, Vertica is 22-40x times faster than Hadoop. And it only uses three rows of SQL.
Statistics show that the Vertica is simpler and faster on top of 1.3 GB data. But the results are not so interesting.
The results for the write task will be quite different-yes, SQL is really easy in this case. We all know that. SQL is much simpler than MapReduce. But in distributed computing, MapReduce is much simpler than SQL. And MapReduce can do things that SQL can't do. For example, image processing.
Use 1.3 GB of data as a benchmark for Vertica or Hadoop. It's like saying, "We're going to have a 50-metre race between Boeing 737 and DC10."
This kind of game doesn't even need to take off.
The comparison of the above blog is the same reason. These techniques are clearly not designed to handle this level of data set.
It is certainly better to have a scalable system, even if the small-scale data is still very fast, but this is not what this article discusses. Whether the performance results of large-scale data are still so obvious, this problem is not so obvious, it is really worth proving.
To help you how to choose what kind of technology based on their actual situation. I have drawn this flowchart:
Original link: http://aaroncordova.com/blog2/roncordova.com/2012/01/do-i-need-sql-or-hadoop-flowchart.html.
A graph that tells you whether you need SQL or Hadoop