Transferred from: http://www.iteblog.com/archives/831
If you want to query a column of a table, hive will by default enable the MapReduce job to accomplish this task, as follows:
Hive> SELECTID, Money from M limitTen; Total MapReduce Jobs=1Launching Job1Out of1Number of reduce tasks is set to0Since there's no reduce operatorCannot run job locally:input Size (=235105473) is larger than Hive.exec.mode.local.auto.inputbytes.max (=134217728) starting Job= job_1384246387966_0229, Tracking URL =http://l-datalogm1.data.cn1:9981/proxy/application_1384246387966_0229/Kill Command=/home/q/hadoop-2.2.0/bin/Hadoop Job-KillJob_1384246387966_0229hadoop Job Information forstage-1: Number of Mappers:1; number of reducers:0 -- One- - One: *: -,167stage-1Map =0%, reduce =0% -- One- - One: *: +,327stage-1Map = -%, reduce =0%, Cumulative CPU1.26sec -- One- - One: *: A,377stage-1Map = -%, reduce =0%, Cumulative CPU1.26secmapreduce Total Cumulative CPU Time:1Seconds260msecended Job=job_1384246387966_0229mapreduce Jobs launched:job0: Map:1Cumulative CPU:1.26sec HDFS Read:8388865HDFS Write: -successtotal MapReduce CPU time Spent:1Seconds260Msecok1 1221 1851 2311 2921 the1 3291 3551 3561 3621 364Time taken:16.802Seconds, fetched:TenRow (s)
As we all know, enabling a mapreduce job consumes overhead. For this issue, starting with the Hive0.10.0 version, there is no need for a mapreduce job for simple, similar select <col> from <table> LIMIT n statements that do not require aggregation, directly through fetch Task fetch data can be implemented in several ways:
method One:
Hive> Set hive.fetch.task.conversion= More; Hive> SELECTID, Money from M limitTen; OK1 1221 1851 2311 2921 the1 3291 3551 3561 3621 364Time taken:0.138Seconds, fetched:TenRow (s)
The above set Hive.fetch.task.conversion=more; The fetch task is turned on, so the simple column query above is not enabled for MapReduce job!
Method Two:
Bin/hive--hiveconf hive.fetch.task.conversion= More
Method Three:
The above two methods can open the fetch task, but all are temporary work, if you want to always enable this feature, you can ${hive_home}/conf/ The following configuration is added to the Hive-site.xml:
<property> <name>hive.fetch.task.conversion</name> <value> more </value> <description> Some select queries can converted to single FETCH task minimizing latency. Currently the query should be a sourced not has any subquery and should not has any aggregations or Distin CTS ( which Incurrs RS), lateral views and joi Ns. 1 Minimal:select STAR, FILTER on partition Columns, LIMIT only 2 . more : Select, FILTER, LIMIT only (+tablesample, Virtual columns) </description></property>
This allows the fetch task to be enabled for a long time.
Go Hive: Simple query enable fetch task without MapReduce job enabled