International - English

Cart Console

Topic Center

Contact Sales

Home > Internet > Online Trends

Hadoop White Paper (4): Introduction to Data Warehouse Hive

Last Update:2015-03-17 Source: Internet

Author: User

Keywords Hadoop Data Warehouse hive

Tags basic compiler customize data data extraction data manipulation data sources data storage

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hive is a data Warehouse architecture built on Hadoop. It provides:

• A convenient set of tools for implementing data extraction (ETL).

• A mechanism for users to describe their structure to the data.

• Support the ability of users to query and analyze massive amounts of data stored in Hadoop.

The basic feature of Hive is that it uses HDFS for data storage and uses Map/reduce framework for data manipulation. So essentially, Hive is a compiler that transforms the user's operations (query or ETL) into map/reduce tasks, using the Map/reduce framework to perform these tasks to process the massive amounts of data on HDFs.

Hive is designed as a batch processing system. It uses the Map/reduce framework to process data. Therefore, it has higher overhead on map/reduce task submission and scheduling. Even for small datasets (hundreds of trillion), latency is also minute. But its biggest advantage is that the delay is linearly increased relative to the dataset size.

Hive defines a simple class SQL query Language hiveql that makes it easy for users familiar with SQL to query. At the same time, HIVEQL also allows programmers familiar with the Map/reduce framework to insert custom mapper and reducer scripts into the query to extend Hive's built-in functionality to perform more complex analysis.

Hive Features

High performance query and analysis system for massive data

Because the Hive query is implemented through the MapReduce framework, MapReduce itself is designed to achieve high-performance processing of massive data. So Hive can efficiently handle massive amounts of data.

At the same time, Hive for hiveql to mapreduce translation of a large number of optimizations to ensure that the resulting MapReduce task is efficient. In practical applications, Hive can efficiently handle TB or even petabytes of data.

Query Language for Class SQL

HIVEQL is very similar to SQL, so a user who is familiar with SQL can easily use Hive for complex queries without training.

HIVEQL Flexible Scalability (extendibility)

In addition to the capabilities provided by HIVEQL, users can customize the data types they use, customize mapper and reducer scripts in any language, and customize functions (normal functions, aggregate functions), and so on. This gives hiveql great scalability. Users can use this scalability to implement very complex queries.

High scalability (scalability) and fault tolerance

The hive itself has no enforcement mechanism, and the execution of user queries is implemented through the MapReduce framework. Because the MapReduce framework itself is highly scalable (the computational power linearly increases as the number of machines in the Hadoop cluster increases) and high fault tolerance, hive has these characteristics.

Fully compatible with other Hadoop products

Instead of storing user data, Hive accesses user data through an interface. This enables hive to support a variety of data sources and data formats. For example, it supports processing of multiple file formats (textfile, sequencefile, etc.) on HDFS and also supports processing of HBase databases. Users can also fully implement their own drivers to add new data sources and data formats. An ideal application model is to realize real-time access to data storage in HBase, and use hive to analyze the data in HBase.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

big data hadoop hive software defined data center white paper hadoop data warehouse hadoop vs data warehouse hadoop data warehouse wiki facebook data analysis using hadoop and hive apache spark white paper

Getting Started with CDN 12-02

Front-end Must Learn: CDN Acceleration Principle 12-02

Elements of CDN Network 12-01

Understand the Principle of CDN Acceleration in One Article 12-01

Cloud Security Issues Derived from the Development of Cloud C... 11-26

8 New Types of Attacks Facing the Cloud Environment 11-26

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

Hot Article

Hot Tags

computing conference access forum computer class data get http html applications

Popular Keywords

html add blank space register business logo register ssl certificate full site sign in sign up node js build cloud register register a subdomain in python network management system tutorial how to learn computer science by myself

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop White Paper (4): Introduction to Data Warehouse Hive

Contact Us

Hot Article

Hot Tags

Popular Keywords

Recommend Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support