Hadoop is powerful, but not omnipotent (CSDN)

Source: Internet
Author: User

Hadoop is powerful, but before you can use Hadoop or big data, it's important to first identify your goals and determine if you've chosen the right tools, since Hadoop isn't everything! This article lists several scenarios that are not suitable for Hadoop.

With the development of Hadoop application, many people fall into the blind worship of it, think it can solve all problems. While Hadoop is a great framework for distributed large data computing, Hadoop is not everything. For example, there are several scenarios where Hadoop is not a good use:

1. Low-Latency data access

Hadoop does not apply to data access that requires real-time queries and low latency . Databases can reduce latency and fast response by Indexing records, which is simply not a substitute for Hadoop. But if you really want to replace a real-time database, you can try hbase for real-time database read and write.

2. Structured data

Hadoop is not suitable for structured data, but is ideal for semi-structured and unstructured data. Unlike Hadoop and RDBMS, distributed storage is generally used, so there will be latency issues when querying for processing.

3, when the amount of data is not large

How much data does Hadoop typically apply to? The answer is:TB or PB. When your data is only dozens of GB, there is no benefit to using Hadoop. According to the needs of the enterprise selective use of Hadoop, do not blindly follow the trend. Hadoop is powerful. But before you can use Hadoop or big data, you need to be clear about your goals and determine if you've chosen the right tools.

4, a large number of small files

Small files refer to files that are much smaller than the block size of HDFs (the default 64M) . If you store a large number of small files in HDFs and each file corresponds to a block, you will consume namenode of memory to hold the block's information . If the small file size is larger, then it will exceed the current level of computer hardware can meet the limit.

5. Too many writes and file updates

HDFs is a number of multi-read methods used. When there are too many file update requirements, Hadoop has no way to support it.

6. MapReduce may not be the best choice

MapReduce is a simple parallel programming model. is a powerful tool for big data parallel computing, but many computational tasks, work, and algorithms are inherently inappropriate for the MapReduce framework.

If you let data share in MapReduce, you can do this:

    • Iteration : Run multiple mapreduce jobs, the output of the previous mapreduce, as input to the next mapreduce.
    • Share state information : but do not share information in memory, because each mapreduce work is run on a single JVM .

Original link: Hadoop isn ' t Silver Bullet

Hadoop is powerful, but not omnipotent (CSDN)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.