Hadoop is powerful, but before you can use Hadoop or big data, it's important to first identify your goals and determine if you've chosen the right tools, since Hadoop isn't everything! This article lists several scenarios that are not suitable for Hadoop.
With the development of Hadoop application, many people fall into the blind worship of it, think it can solve all problems. While Hadoop is a great framework for distributed large data computing, Hadoop is not everything. For example, there are several scenarios where Hadoop is not a good use:
1. Low-Latency data access
Hadoop does not apply to data access that requires real-time queries and low latency . Databases can reduce latency and fast response by Indexing records, which is simply not a substitute for Hadoop. But if you really want to replace a real-time database, you can try hbase for real-time database read and write.
2. Structured data
Hadoop is not suitable for structured data, but is ideal for semi-structured and unstructured data. Unlike Hadoop and RDBMS, distributed storage is generally used, so there will be latency issues when querying for processing.
3, when the amount of data is not large
How much data does Hadoop typically apply to? The answer is:TB or PB. When your data is only dozens of GB, there is no benefit to using Hadoop. According to the needs of the enterprise selective use of Hadoop, do not blindly follow the trend. Hadoop is powerful. But before you can use Hadoop or big data, you need to be clear about your goals and determine if you've chosen the right tools.
4, a large number of small files
Small files refer to files that are much smaller than the block size of HDFs (the default 64M) . If you store a large number of small files in HDFs and each file corresponds to a block, you will consume namenode of memory to hold the block's information . If the small file size is larger, then it will exceed the current level of computer hardware can meet the limit.
5. Too many writes and file updates
HDFs is a number of multi-read methods used. When there are too many file update requirements, Hadoop has no way to support it.
6. MapReduce may not be the best choice
MapReduce is a simple parallel programming model. is a powerful tool for big data parallel computing, but many computational tasks, work, and algorithms are inherently inappropriate for the MapReduce framework.
If you let data share in MapReduce, you can do this:
- Iteration : Run multiple mapreduce jobs, the output of the previous mapreduce, as input to the next mapreduce.
- Share state information : but do not share information in memory, because each mapreduce work is run on a single JVM .
Original link: Hadoop isn ' t Silver Bullet
Hadoop is powerful, but not omnipotent (CSDN)