Improving data access: HBase, Sqoop, and flume-products and technologies
Source: Internet
Author: User
Keywordsnbsp; improve DFS fit while
Improving data access: http://www.aliyun.com/zixun/aggregation/13713.html ">hbase, Sqoop, and flume release time: 2012.04.16 14:38 Source: And the Author: News
The Hadoop core is a batch system with data loaded into HDFS, processed and then retrieved. For computing This is somewhat regressive, but usually interactive and random access data is necessary. HBase runs on HDFs as a column-oriented database. HBase Google BigTable as the blueprint. The goal of the project is to quickly locate and access the required data within billions of rows of data in the host. HBase uses MapReduce to handle the massive amount of data inside. At the same time, both hive and pig can be used in combination with hbase, hive and pig provide high-level language support for hbase, making it very easy to do data statistics processing on hbase.
But in order to authorize the random storage of data, HBase also made some restrictions: for example, Hive and HBase performance than the original HDFs hive is 4-5 times slower. At the same time HBase can store PB-level data, compared to HDFS capacity limit of 30PB. HBase is not suitable for hoc analysis, hbase is better suited to integrate large data as part of large applications, including logs, calculations, and time series data.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.