To make you really understand what hive is.

Source: Internet
Author: User
Keywords What understand we then

Reading:
There are several situations in the understanding of a thing
1. There is no contact, do not know what this thing is, so it will not produce any problems.
2. Contact, but do not know what he is, anyway I use every day.
3. Have a certain understanding, not enough head side.
So Hive,
1. How much do we know about it?
2. What is it exactly?
What is 3.hive related to Hadoop?
Extension:
What's the relationship between HBase and hive?



Hive was originally created and developed on the basis of the demand for management and machine learning of the massive emerging social network data generated by http://www.aliyun.com/zixun/aggregation/1560.html >facebook. So, what exactly is Hive, let's take a look at Hive's website wiki how to introduce Hive (Https://cwiki.apache.org/confluence/display/Hive/Home):

The Apache Hive Data Warehouse software facilitates querying and managing SCM datasets in residing distributed. Built on top of Apache Hadooptm, it provides:
(1), Tools to enable easy data extract/transform/load (ETL)
(2), a mechanism to impose businessesflat-out on a produced of data formats
(3), Access to the files stored either directly in Apache hdfstm or in the other data storage bae such as Apache Hbasetm
(4), Query execution via MapReduce



The general meaning of English above is:
the Apache hive Data Warehouse software provides query and management of large datasets stored in distributed, which itself is based on Apache Hadoop and provides the following features:
(1) It provides a range of tools that can be used to extract/transform/load (ETL) data, and
(2) is a mechanism for storing, querying, and analyzing large-scale data stored in HDFs (or hbase);
(3) The query is done through MapReduce (not all queries need to be mapreduce to complete, such as SELECT * from XXX),
(4) in Hive0.11 for similar select A,b from The query for XXX can be configured or not through MapReduce to complete the


The meaning above is clear. Refine it for him again:
1.hive is a data Warehouse
2.hive based on Hadoop. The
is summed up in one sentence: hive is a data warehouse based on Hadoop.

So above "based on" how to say, look at the following


Hive is a data Warehouse architecture built on the Hadoop file system and analyzes and manages the data stored in HDFs;
(That is, the data stored in HDFS for analysis and management, we do not want to use manual, we create a tool, then the tool can be hive)


To this, we have understood what hive is, want a deeper understanding, you can view the following content.





So how do we analyze and manage that data?
Hive defines a SQL-like query language, known as HQL, that can be used to query data directly from users who are familiar with SQL. At the same time, the language allows familiar MapReduce developers to develop custom mappers and reducers to handle the complex analytical work that the built-in mappers and reducers cannot complete. Hive can allow users to write their own defined function UDF to use in queries. There are 3 kinds of udf:user tabbed functions (UDF), User tabbed Aggregation (functions), user Udaf Table tabbed in Hive Functions (UDTF).

Days, Hive is already a successful Apache project, and many organizations use it as a general-purpose, scalable data-processing platform.
Of course, there is a big difference between hive and traditional relational databases, hive to resolve external tasks into a mapreduce executable plan, and startup MapReduce is a high latency event that takes a lot of time to commit and perform tasks, This also determines that hive can only handle high latency applications (if you want to deal with low latency applications, you can consider hbase). At the same time, because the design is not the same goal, Hive currently does not support transactions; table data cannot be modified (cannot be updated, deleted, inserted; Can append data to, re-import data only); You cannot index columns (but hive supports indexing, but does not improve hive query speed.) If you want to improve the hive query speed, please learn hive partition, bucket application.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.