Topic Center

Contact Sales

Home > Hot Categories > Big Data

Big Data-Hive

Last Update:2016-01-19 Source: Internet

Author: User

Tags key string

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

built on top of HadoopData Warehouse, data calculation using MR, data storage using HDFs because data calculations use MapReduce, they are typically used for offline data processingHive defines a class of SQL query Language--hqlSQL-like, but not exactly the same can be considered as a HQL-->MR language translator . simple, easy to get started
with Hive, do you still need to write your own Mr Program?？ The ability of the HQL expression of hive finite iterative algorithm cannot express some complex operations with HQL not easy to express hiveless efficientHive automatically generates MapReduce jobs, usually not smart enough, hql tuning difficulties, coarse and controllable granularity
The hive consists of modulesuser Interfaceincluding Cli,jdbc/odbc, WebUIMeta data Storage (Metastore)default is stored in your own database Derby, which is typically used for MySQL on-lineDrive (Driver)interpreter, compiler, optimizer, actuatorHadoopcalculate with MapReduce and store with HDFS
Hive Deployment Architecture-Lab environment

Hive Deployment Architecture-production environment

Data Model
Partition and Buckets
PartitionTo reduce unnecessary brute force data scanning, tables can be partitionedTo avoid generating too many small files, it is recommended that you partition only discrete fieldsBucketsfor fields with a higher value, you can divide them into bucketscan be combined with partition and buckets
SELECT statementhave and exist in operations are not supported and can be converted to left SEMI join operationsjoin (only equivalent connections are supported), non-equivalent connections are not supported
Order by and sort by
Order bystart a reduce taskglobally ordered dataThe speed may be very slowStrict mode, must be in conjunction with limitSort byyou can have multiple reduce taskinternal data for each reduce task is ordered, but globally unorderedusually with distribute by
distribute by and cluster byDistribute byequivalent to the Paritioner in MapReduce, the default is based on the hash implementation;use with sort by to play a very good roleCluster bywhen distribute by is used with sort by (descending), and the following fields are the same, the cluster by is abbreviated;
user-defined function UDF:one way to extend HQL capabilities
HQL Support index? the HQL execution process is primarily a parallel, violent scan. Currently, Hive supports only single-table indexes , but it provides index creation interface and calling method, which can be implemented by users as needed.does the HQL support update operation? not supported. Hive Bottom is Hdfs,hdfs only support append operation, do not support random write ;Skew data processing mechanism? Specify skew column: CREATE TABLE list_bucket_single (key string, value string) skewed by (key) on (1,5,6);assigning more resources to skew task (TODO)break skew task into multiple tasks and merge results (TODO)
Hive on HBaseusing hql to process data in HBase more convenient than accessing data directly through the HBase API;but lower performance is equivalent to converting online processing to batch processingThere is a problem not mature enough ;can't get data on time, always fetch the latest data by default
a similar system for hiveStingerThe Next generation of Hive is called "Stinger", and its underlying computing engine will replace MapReduce with Tez;Tez has a number of advantages over MapReduce:A variety of operators (such as map, shuffle, etc.) are provided for user use;combine multiple jobs into one job to reduce disk read/write IO;make full use of memory resources.
Shark Hive on Spark(http://spark.incubator.apache.org/); Spark is a memory-computing framework that is more efficient than MapReduce (part of the test shows that the speed is 100x); Shark fully compatible with hive, bottom-level computing The engine uses spark.
ImpalaThe underlying computing engine no longer uses MR, but rather uses a distributed query engine similar to a commercial parallel relational database;
Performance Comparison

Big Data-Hive

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

big data analytics turning big data into big money big data hadoop wiki flume big data sqoop big data big data software tools edx big data course big data analytics books

Big Data era: a summary of knowledge points based on Microsof... 11-05

Big Data Architecture Development Mining Analytics Hadoop HBa... 04-28

Big Data Architecture Development Mining Analytics Hadoop HBa... 12-02

0 Basic Learning Cloud computing and Big Data DBA cluster Arc... 02-21

"Big Data dry" implementation of big data platform based on H... 10-21

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Big Data-Hive

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support