Bucket table in hive

Source: Internet
Author: User
Tags hadoop fs

When the data volume is large, we need to complete the task faster. Multiple map and reduce processes are the only choice.
However, if the input file is one, only one map task can be started.
In this case, the bucket table is a good choice. by specifying the clustered field, the file is hashed into multiple small files through hash.

Create Table sunwg_test11 (ID int, name string)
Clustered by (ID) sorted by (name) into 32 buckets
Row format delimited
Fields terminated by '/T ';

Do not forget to set before executing insert
Set hive. Enforce. bucketing = true;
Multiple reducers are mandatory for output.

Hive> insert overwrite table sunwg_test11 select * From test09;
Total mapreduce jobs = 1
Launching job 1 out of 1
Number of reduce tasks determined at compile time: 32
In order to change the average load for a reducer (in bytes ):
Set hive.exe C. reducers. bytes. Per. Cer CER = <number>
In order to limit the maximum number of specified CERs:
Set hive.exe C. Fetch CERs. max = <number>
In order to set a constant number of specified CERs:
Set mapred. Reduce. Tasks = <number>
Starting job = job_201103070826_0018, tracking url =
Http: // hadoop00: 50030/jobdetails. jsp? Jobid = job_201103070826_0018
Kill command =/home/hjl/hadoop/bin/../bin/hadoop job-dmapred. Job. Tracker = hadoop00: 9001-kill job_201103070826_0018
11:34:23, 055 stage-1 Map = 0%, reduce = 0%
11:34:27, 084 stage-1 Map = 6%, reduce = 0%
11:34:29, 100 stage-1 Map = 13%, reduce = 0%
11:34:32, 124 stage-1 Map = 19%, reduce = 0%
11:34:34, 142 stage-1 Map = 22%, reduce = 0%
11:34:35, 151 stage-1 Map = 25%, reduce = 0%
11:34:37, 167 stage-1 Map = 28%, reduce = 0%
11:34:39, 182 stage-1 Map = 31%, reduce = 0%
11:34:41, 199 stage-1 Map = 34%, reduce = 1%
11:34:42, 211 stage-1 Map = 38%, reduce = 1%
11:34:44, 233 stage-1 Map = 41%, reduce = 1%
11:34:46, 250 stage-1 Map = 44%, reduce = 1%
11:34:48, 270 stage-1 Map = 47%, reduce = 1%
11:34:49, 280 stage-1 Map = 50%, reduce = 1%
11:34:51, 300 stage-1 Map = 53%, reduce = 1%
11:34:53, 316 stage-1 Map = 56%, reduce = 1%
11:34:55, 330 stage-1 Map = 59%, reduce = 1%
11:34:56, 340 stage-1 Map = 63%, reduce = 1%
11:34:58, 357 stage-1 Map = 66%, reduce = 1%
11:35:00, 378 stage-1 Map = 69%, reduce = 1%
11:35:02, 393 stage-1 Map = 72%, reduce = 1%
11:35:04, 409 stage-1 Map = 75%, reduce = 1%
11:35:05, 419 stage-1 Map = 78%, reduce = 1%
11:35:07, 435 stage-1 Map = 81%, reduce = 1%
11:35:09, 451 stage-1 Map = 84%, reduce = 2%
11:35:12, 475 stage-1 Map = 88%, reduce = 2%
11:35:14, 496 stage-1 Map = 91%, reduce = 2%
11:35:16, 513 stage-1 Map = 94%, reduce = 2%
11:35:18, 528 stage-1 Map = 97%, reduce = 2%
11:35:20, 552 stage-1 Map = 100%, reduce = 2%
11:35:25, 589 stage-1 Map = 100%, reduce = 6%
11:35:33, 645 stage-1 Map = 100%, reduce = 9%
11:35:34, 654 stage-1 Map = 100%, reduce = 13%
11:35:39, 693 stage-1 Map = 100%, reduce = 16%
11:35:41, 710 stage-1 Map = 100%, reduce = 19%
11:35:45, 740 stage-1 Map = 100%, reduce = 22%
11:35:47, 757 stage-1 Map = 100%, reduce = 25%
11:35:52, 793 stage-1 Map = 100%, reduce = 28%
11:35:54, 808 stage-1 Map = 100%, reduce = 31%
11:35:59, 844 stage-1 Map = 100%, reduce = 34%
11:36:01, 861 stage-1 Map = 100%, reduce = 38%
11:36:05, 891 stage-1 Map = 100%, reduce = 41%
11:36:07, 911 stage-1 Map = 100%, reduce = 44%
11:36:12, 947 stage-1 Map = 100%, reduce = 47%
11:36:13, 958 stage-1 Map = 100%, reduce = 50%
11:36:19, 002 stage-1 Map = 100%, reduce = 53%
11:36:21, 017 stage-1 Map = 100%, reduce = 56%
11:36:26, 053 stage-1 Map = 100%, reduce = 59%
11:36:28, 068 stage-1 Map = 100%, reduce = 63%
11:36:33, 106 stage-1 Map = 100%, reduce = 66%
11:36:35, 122 stage-1 Map = 100%, reduce = 69%
11:36:39, 152 stage-1 Map = 100%, reduce = 72%
11:36:41, 169 stage-1 Map = 100%, reduce = 75%
11:36:46, 208 stage-1 Map = 100%, reduce = 78%
11:36:48, 227 stage-1 Map = 100%, reduce = 81%
11:36:53, 262 stage-1 Map = 100%, reduce = 84%
11:36:54, 271 stage-1 Map = 100%, reduce = 88%
11:36:59, 309 stage-1 Map = 100%, reduce = 91%
11:37:01, 328 stage-1 Map = 100%, reduce = 94%
11:37:06, 365 stage-1 Map = 100%, reduce = 97%
11:37:08, 382 stage-1 Map = 100%, reduce = 100%
Ended job = job_201103070826_0018
Loading data to table sunwg_test11
5 rows loaded to sunwg_test11
OK
Time taken: 175.036 seconds

There are 32 files in the hive sunwg_test11 folder, instead of a file.
[Hadoop @ hadoop00 ~] $ Hadoop FS-ls/hjl/sunwg_test11
Found 32 Items
-RW-r-3 hjl hadoop 0/hjl/sunwg_test11/attempt_201103070826_0018_r_000000_0
-RW-r-3 hjl hadoop 0/hjl/sunwg_test11/attempt_201103070826_0018_r_000001_0
-RW-r-3 hjl hadoop 0/hjl/sunwg_test11/attempt_201103070826_0018_r_000002_0
-RW-r-3 hjl hadoop 0/hjl/sunwg_test11/attempt_201103070826_0018_r_000003_0
-RW-r-3 hjl hadoop 8/hjl/sunwg_test11/attempt_201103070826_0018_r_000004_0
-RW-r-3 hjl hadoop 9/hjl/sunwg_test11/attempt_201103070826_0018_r_000005_0
-RW-r-3 hjl hadoop 8/hjl/sunwg_test11/attempt_201103070826_0018_r_000006_0
-RW-r-3 hjl hadoop 9/hjl/sunwg_test11/attempt_201103070826_0018_r_000007_0
-RW-r-3 hjl hadoop 9/hjl/sunwg_test11/attempt_201103070826_0018_r_000008_0
-RW-r-3 hjl hadoop 0/hjl/sunwg_test11/attempt_201103070826_0018_r_000009_0
-RW-r-3 hjl hadoop 0/hjl/sunwg_test11/attempt_201103070826_0018_r_1_10_0
-RW-r-3 hjl hadoop 0/hjl/sunwg_test11/attempt_201103070826_0018_r_1_11_0
-RW-r-3 hjl hadoop 0/hjl/sunwg_test11/attempt_201103070826_0018_r_1_12_0
-RW-r-3 hjl hadoop 0/hjl/sunwg_test11/attempt_201103070826_0018_r_1_13_0
-RW-r-3 hjl hadoop 0/hjl/sunwg_test11/attempt_201103070826_0018_r_1_14_0
-RW-r-3 hjl hadoop 0/hjl/sunwg_test11/attempt_201103070826_0018_r_1_15_0
-RW-r-3 hjl hadoop 0/hjl/sunwg_test11/attempt_201103070826_0018_r_201716_0
-RW-r-3 hjl hadoop 0/hjl/sunwg_test11/attempt_201103070826_0018_r_1_17_0
-RW-r-3 hjl hadoop 0/hjl/sunwg_test11/attempt_201103070826_0018_r_1_18_0
-RW-r-3 hjl hadoop 0/hjl/sunwg_test11/attempt_201103070826_0018_r_1_19_0
-RW-r-3 hjl hadoop 0/hjl/sunwg_test11/attempt_201103070826_0018_r_1_20_0
-RW-r-3 hjl hadoop 0/hjl/sunwg_test11/attempt_201103070826_0018_r_1_21_0
-RW-r-3 hjl hadoop 0/hjl/sunwg_test11/attempt_201103070826_0018_r_1_22_0
-RW-r-3 hjl hadoop 0/hjl/sunwg_test11/attempt_201103070826_0018_r_1_23_0
-RW-r-3 hjl hadoop 0/hjl/sunwg_test11/attempt_201103070826_0018_r_1_24_0
-RW-r-3 hjl hadoop 0/hjl/sunwg_test11/attempt_201103070826_0018_r_201725_0
-RW-r-3 hjl hadoop 0/hjl/sunwg_test11/attempt_201103070826_0018_r_1_26_0
-RW-r-3 hjl hadoop 0/hjl/sunwg_test11/attempt_201103070826_0018_r_1_27_0
-RW-r-3 hjl hadoop 0/hjl/sunwg_test11/attempt_201103070826_0018_r_1_28_0
-RW-r-3 hjl hadoop 0/hjl/sunwg_test11/attempt_201103070826_0018_r_1_29_0
-RW-r-3 hjl hadoop 0/hjl/sunwg_test11/attempt_201103070826_0018_r_1_30_0
-RW-r-3 hjl hadoop 0/hjl/sunwg_test11/attempt_201103070826_0018_r_1_31_0

After the file is split, you can start multiple mapreduce tasks.
When you perform some operations, you will find that the system has started 32 map tasks.

 

This article from http://www.oratea.net /? P = 492

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.