Hadoop Terasort Benchmark Test experiment

Last Update:2018-07-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Author:zhankunlin
Date:2011-4-1
Key Words:hadoop, Terasort

< a > Terasort introduction

1TB sequencing is typically used to measure the data processing capabilities of a distributed data processing framework. Terasort is a sort job in Hadoop, and in 2008, Hadoop won the first place in the 1TB sort benchmark evaluation, taking 209 seconds.

< two > Related Materials

Hadoop mapreduce Scalability Test: http://cloud.csdn.net/a/20100901/278934.html
Using MPI to realize Hadoop:map/reduce Terasort http://emonkey.blog.sohu.com/166546157.html
Terasort algorithm analysis in Hadoop: http://dongxicheng.org/mapreduce/hadoop-terasort-analyse/
1TB sort terasort:http://hi.baidu.com/dtzw/blog/item/cffc8e1830f908b94bedbc12.html for Hadoop
Sort benchmark:http://sortbenchmark.org/
Trir Tree: http://www.cnblogs.com/cherish_yimi/archive/2009/10/12/1581666.html
< three > experiment

(0) Source location
/local/zkl/hadoop/hadoop-0.20.1/hadoop-0.20.1/src/examples/org/apache/hadoop/examples/terasort

(1) First execute Teragen generate data

[Root@gd86 hadoop-0.20.1]#/local/zkl/hadoop/hadoop-0.20.1/hadoop-0.20.1/bin/hadoop jar Hadoop-0.20.1-examples.jar Teragen 1000000 Terasort/1000000-input

View the generated data

[Root@gd86 hadoop-0.20.1]#/local/zkl/hadoop/hadoop-0.20.1/hadoop-0.20.1/bin/hadoop fs-ls/user/root/terasort/ 1000000-input
Found 3 Items
Drwxr-xr-x-root supergroup 0 2011-03-31 16:21/user/root/terasort/1000000-input/_logs
-rw-r--r--3 root supergroup 50000000 2011-03-31 16:21/user/root/terasort/1000000-input/part-00000
-rw-r--r--3 root supergroup 50000000 2011-03-31 16:21/user/root/terasort/1000000-input/part-00001

Generate two data, each size is 50000000 B =

[Root@gd86 hadoop-0.20.1]# bin/hadoop jar Hadoop-0.20.1-examples.jar Teragen terasort/1000000-input
Will generate two B data, plus a total of B = 1 kb

The resulting data row is 100B, the parameter 10 indicates that 10 rows are generated, and a total of 1000b;1,000,000 rows have 100,000,000 B = M;

Teragen is a two map to complete the data generation, each map generated a file, two file size of a total of eight m, each is a.

[Root@gd86 hadoop-0.20.1]# bin/hadoop jar Hadoop-0.20.1-examples.jar Teragen 10000000 terasort/1G-input

This results in 1 G of data, which is divided into 16 blocks, since the data block is a single piece of data, and there are 64 map tasks when running Terasort.

[Root@gd86 hadoop-0.20.1]# bin/hadoop jar Hadoop-0.20.1-examples.jar Teragen 10000000 terasort/1G-input
Generating 10000000 using 2 maps with step of 5000000
11/04/01 17:02:46 INFO mapred. Jobclient:running job:job_201103311423_0005
11/04/01 17:02:47 INFO mapred. Jobclient:map 0% Reduce 0%
11/04/01 17:03:00 INFO mapred. Jobclient:map 19% Reduce 0%
11/04/01 17:03:01 INFO mapred. Jobclient:map 41% Reduce 0%
11/04/01 17:03:03 INFO mapred. Jobclient:map 52% Reduce 0%
11/04/01 17:03:04 INFO mapred. Jobclient:map 63% Reduce 0%
11/04/01 17:03:06 INFO mapred. Jobclient:map 74% Reduce 0%
11/04/01 17:03:10 INFO mapred. Jobclient:map 91% Reduce 0%
11/04/01 17:03:12 INFO mapred. Jobclient:map 100% Reduce 0%
11/04/01 17:03:14 INFO mapred. Jobclient:job complete:job_201103311423_0005
11/04/01 17:03:14 INFO mapred. Jobclient:counters:6
11/04/01 17:03:14 INFO mapred. Jobclient:job Counters
11/04/01 17:03:14 INFO mapred. jobclient:launched Map tasks=2
11/04/01 17:03:14 INFO mapred. Jobclient:filesystemcounters
11/04/01 17:03:14 INFO mapred. jobclient:hdfs_bytes_written=1000000000
11/04/01 17:03:14 INFO mapred. Jobclient:map-reduce Framework
11/04/01 17:03:14 INFO mapred. Jobclient:map input records=10000000
11/04/01 17:03:14 INFO mapred. Jobclient:spilled records=0
11/04/01 17:03:14 INFO mapred. Jobclient:map input bytes=10000000
11/04/01 17:03:14 INFO mapred. Jobclient:map Output records=10000000

(2) Perform terasort sorting

Execution of the Terasort program will execute 16 maptask

ROOT@GD38 hadoop-0.20.1# bin/hadoop jar Hadoop-0.20.1-examples.jar terasort terasort/1g-input terasort/1G-output

11/03/31 17:12:49 INFO Terasort. Terasort:starting
11/03/31 17:12:49 INFO mapred. Fileinputformat:total input paths to Process:2
11/03/31 17:13:05 INFO util. Nativecodeloader:loaded The Native-hadoop Library
11/03/31 17:13:05 INFO zlib. Zlibfactory:successfully Loaded & initialized Native-zlib Library
11/03/31 17:13:05 INFO Compress. Codecpool:got brand-new Compressor
Making 1 from 100000 records
Step size is 100000.0
11/03/31 17:13:06 INFO mapred. Jobclient:running job:job_201103311423_0006
11/03/31 17:13:07 INFO mapred. Jobclient:map 0% Reduce 0%
11/03/31 17:13:20 INFO mapred. Jobclient:map 12% Reduce 0%
11/03/31 17:13:21 INFO mapred. Jobclient:map 37% Reduce 0%
11/03/31 17:13:29 INFO mapred. Jobclient:map 50% Reduce 2%
11/03/31 17:13:30 INFO mapred. Jobclient:map 75% Reduce 2%
11/03/31 17:13:32 INFO mapred. Jobclient:map 75% Reduce 12%
11/03/31 17:13:36 INFO mapred. Jobclient:map 87% Reduce 12%
11/03/31 17:13:38 INFO mapred. Jobclient:map 100% Reduce 12%
11/03/31 17:13:41 INFO mapred. Jobclient:map 100% Reduce 25%
11/03/31 17:13:44 INFO mapred. Jobclient:map 100% Reduce 31%
11/03/31 17:13:53 INFO mapred. Jobclient:map 100% Reduce 33%
11/03/31 17:14:02 INFO mapred. Jobclient:map 100% Reduce 68%
11/03/31 17:14:05 INFO mapred. Jobclient:map 100% Reduce 71%
11/03/31 17:14:08 INFO mapred. Jobclient:map 100% Reduce 75%
11/03/31 17:14:11 INFO mapred. Jobclient:map 100% Reduce 79%
11/03/31 17:14:14 INFO mapred. Jobclient:map 100% Reduce 82%
11/03/31 17:14:17 INFO mapred. Jobclient:map 100% Reduce 86%
11/03/31 17:14:20 INFO mapred. Jobclient:map 100% Reduce 90%
11/03/31 17:14:23 INFO mapred. Jobclient:map 100% Reduce 93%
11/03/31 17:14:26 INFO mapred. Jobclient:map 100% Reduce 97%
11/03/31 17:14:32 INFO mapred. Jobclient:map 100% Reduce 100%
11/03/31 17:14:34 INFO mapred. Jobclient:job complete:job_201103311423_0006
11/03/31 17:14:34 INFO mapred. Jobclient:counters:18
11/03/31 17:14:34 INFO mapred. Jobclient:job Counters
11/03/31 17:14:34 INFO mapred. jobclient:launched Reduce Tasks=1
11/03/31 17:14:34 INFO mapred. jobclient:launched Map tasks=16
11/03/31 17:14:34 INFO mapred. Jobclient:data-local Map tasks=16
11/03/31 17:14:34 INFO mapred. Jobclient:filesystemcounters
11/03/31 17:14:34 INFO mapred. jobclient:file_bytes_read=2382257412
11/03/31 17:14:34 INFO mapred. jobclient:hdfs_bytes_read=1000057358
11/03/31 17:14:34 INFO mapred. jobclient:file_bytes_written=3402255956
11/03/31 17:14:34 INFO mapred. jobclient:hdfs_bytes_written=1000000000
11/03/31 17:14:34 INFO mapred. Jobclient:map-reduce Framework
11/03/31 17:14:34 INFO mapred. Jobclient:reduce input groups=10000000
11/03/31 17:14:34 INFO mapred. Jobclient:combine Output Records=0
11/03/31 17:14:34 INFO mapred. Jobclient:map input records=10000000
11/03/31 17:14:34 INFO mapred. Jobclient:reduce Shuffle bytes=951549012
11/03/31 17:14:34 INFO mapred. Jobclient:reduce Output records=10000000
11/03/31 17:14:34 INFO mapred. Jobclient:spilled records=33355441
11/03/31 17:14:34 INFO mapred. Jobclient:map Output bytes=1000000000
11/03/31 17:14:34 INFO mapred. Jobclient:map input bytes=1000000000
11/03/31 17:14:34 INFO mapred. Jobclient:combine input Records=0
11/03/31 17:14:34 INFO mapred. Jobclient:map Output records=10000000
11/03/31 17:14:34 INFO mapred. Jobclient:reduce input records=10000000
11/03/31 17:14:34 INFO Terasort. Terasort:done

Execution completed, sorted, and the resulting data is still 1G,

ROOT@GD38 hadoop-0.20.1# bin/hadoop Fs-ls terasort/1g-output
Found 2 Items
Drwxr-xr-x-root supergroup 0 2011-03-31 17:13/user/root/terasort/1g-output/_logs
-rw-r--r--1 root supergroup 1000000000 2011-03-31 17:13/user/root/terasort/1g-output/part-00000

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop Terasort Benchmark Test experiment

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hadoop Terasort Benchmark Test experiment

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support