Go Hive: Simple query enable fetch task without MapReduce job enabled

Source: Internet
Author: User

Transferred from: http://www.iteblog.com/archives/831

If you want to query a column of a table, hive will by default enable the MapReduce job to accomplish this task, as follows:

Hive> SELECTID, Money from M limitTen; Total MapReduce Jobs=1Launching Job1Out of1Number of reduce tasks is set to0Since there's no reduce operatorCannot run job locally:input Size (=235105473) is larger than Hive.exec.mode.local.auto.inputbytes.max (=134217728) starting Job= job_1384246387966_0229, Tracking URL =http://l-datalogm1.data.cn1:9981/proxy/application_1384246387966_0229/Kill Command=/home/q/hadoop-2.2.0/bin/Hadoop Job-KillJob_1384246387966_0229hadoop Job Information forstage-1: Number of Mappers:1; number of reducers:0 -- One- -  One: *: -,167stage-1Map =0%, reduce =0% -- One- -  One: *: +,327stage-1Map = -%, reduce =0%, Cumulative CPU1.26sec -- One- -  One: *: A,377stage-1Map = -%, reduce =0%, Cumulative CPU1.26secmapreduce Total Cumulative CPU Time:1Seconds260msecended Job=job_1384246387966_0229mapreduce Jobs launched:job0: Map:1Cumulative CPU:1.26sec HDFS Read:8388865HDFS Write: -successtotal MapReduce CPU time Spent:1Seconds260Msecok1       1221       1851       2311       2921        the1       3291       3551       3561       3621       364Time taken:16.802Seconds, fetched:TenRow (s)

As we all know, enabling a mapreduce job consumes overhead. For this issue, starting with the Hive0.10.0 version, there is no need for a mapreduce job for simple, similar select <col> from <table> LIMIT n statements that do not require aggregation, directly through fetch Task fetch data can be implemented in several ways:

method One:
Hive> Set hive.fetch.task.conversion= More; Hive> SELECTID, Money from M limitTen; OK1       1221       1851       2311       2921        the1       3291       3551       3561       3621       364Time taken:0.138Seconds, fetched:TenRow (s)

The above set Hive.fetch.task.conversion=more; The fetch task is turned on, so the simple column query above is not enabled for MapReduce job!

Method Two:
Bin/hive--hiveconf hive.fetch.task.conversion= More
Method Three:

The above two methods can open the fetch task, but all are temporary work, if you want to always enable this feature, you can ${hive_home}/conf/ The following configuration is added to the Hive-site.xml:

 <property> <name>hive.fetch.task.conversion</name> <value> more  </value> <description> Some  select   queries can converted to single FETCH task minimizing latency. Currently the query should be a sourced not has any subquery and should not has any aggregations or Distin CTS ( which   Incurrs RS), lateral views and joi    Ns.  1   Minimal:select STAR, FILTER on partition Columns, LIMIT only  2 . more : Select, FILTER, LIMIT only (+tablesample, Virtual columns)  </description></property> 

This allows the fetch task to be enabled for a long time.

Go Hive: Simple query enable fetch task without MapReduce job enabled

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.