What is Hive

Source: Internet
Author: User

What is Hive

Hive is a data warehousing infrastructure based on Hadoop. Hadoop provides massive scale out and fault tolerance capabilities for data storage and processing (using the Map-reduce p Rogramming paradigm) on commodity hardware.

Hive is designed to enable easy data summarization, Ad-hoc querying and analysis of large volumes of data. IT provides a simple query language called Hive QL, which are based on SQL and which enables users familiar with SQL to do Ad-hoc querying, summarization and data analysis easily. At the same time, Hive QL also allows traditional map/reduce programmers to BES able to plug in their custom mappers and re Ducers to does more sophisticated an analysis of this May is supported by the built-in capabilities of the language.

What Hive isn't

Hadoop is a batch processing system and Hadoop jobs tend to has high latency and incur substantial overheads in job submi Ssion and scheduling. As a result-latency for Hive queries was generally very high (minutes) even if data sets involved are very small (say a Few hundred megabytes). As a result it cannot is compared with systems such as Oracle where analyses is conducted on a significantly smaller Amou NT of data but the analyses proceed much more iteratively with the response times between iterations being less than a few Minutes. Hive aims to provide acceptable (and not optimal) latency for interactive data browsing, queries over small data sets or T EST queries.

Hive is not a designed for online transaction processing and does not an offer real-time queries and row level updates. It's best used for batch jobs over large sets of immutable data (like web logs). Online transaction processing is not supported, real-time queries are not supported, and row data updates are not supported. The best use is to batch process large data sets with immutable data, such as Web logs.

In the following sections we provide a tutorial on the capabilities of the system. We start by describing the concepts of data types, tables and partitions (which is very similar to what you would find in A traditional relational DBMS) and then illustrate the capabilities of the QL language with the help of some examples.

What is Hive

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.