Hadoop configuration Item Grooming (mapred-site.xml)

Source: Internet
Author: User
Tags sort

Continuation of the article



Name Value Description
Hadoop.job.history.location Job history File Save path, no configurable parameters, and do not write in the configuration file, default in the Logs folder.
Hadoop.job.history.user.location User History File storage location
Io.sort.factor 30 Here we deal with the number of file sorts when the stream is merged, and I understand the number of files opened when sorting
Io.sort.mb 600 Sort the amount of memory used, unit trillion, default 1, I remember is not more than mapred.child.java.opt settings, otherwise will Oom
Mapred.job.tracker hadoopmaster:9001 Connection Jobtrack Server configuration items, default is not write Local,map number 1,reduce 1
Mapred.job.tracker.http.address 0.0.0.0:50030 Jobtracker's tracker page Service listener address
Mapred.job.tracker.handler.count 15 Jobtracker the number of threads serviced
Mapred.task.tracker.report.address 127.0.0.1:0 Tasktracker monitoring Server, no configuration, and the official does not recommend self-modification
Mapred.local.dir /data1/hdfs/mapred/local,
/data2/hdfs/mapred/local,
...
mapred do local calculations using folders, you can configure multiple hard disks, comma separated
Mapred.system.dir /data1/hdfs/mapred/system,
/data2/hdfs/mapred/system,
...
Mapred the folder used to store the control files, you can configure multiple hard disks, separated by commas.
Mapred.temp.dir /data1/hdfs/mapred/temp,
/data2/hdfs/mapred/temp,
...
Mapred the shared temporary folder path, as explained above.
Mapred.local.dir.minspacestart 1073741824 Local arithmetic folder The remaining space below this value is not calculated locally. Byte configuration, default 0
Mapred.local.dir.minspacekill 1073741824 The remaining space on the local compute folder is less than the value of the new task, bytes, default 0
Mapred.tasktracker.expiry.interval 60000 TT did not send a heartbeat at this time, then thought TT had been hung. Unit milliseconds
Mapred.map.tasks 2 The default number of maps used by each job, which means that if the DFS block size is set to 64M, a 60M file needs to be sorted, and 2 map threads will be opened, and the Jobtracker set to local is not working.
Mapred.reduce.tasks 1 Explanation Ibid.
Mapred.jobtracker.restart.recover true | False Turn on task recovery on restart, default false
Mapred.jobtracker.taskScheduler Org.apache.hadoop.mapred.
Capacitytaskscheduler

Org.apache.hadoop.mapred.
Jobqueuetaskscheduler

Org.apache.hadoop.mapred.
Fairscheduler
Important thing, open Task Manager, if not set, Hadoop default is FIFO scheduler, other can use fair and compute Power Scheduler
Mapred.reduce.parallel.copies 10 Reduce the number of parallel copies used in the shuffle phase, default 5
Mapred.child.java.opts

-xmx2048m

-djava.library.path=
/opt/hadoopgpl/native/
Linux-amd64-64

The size of the virtual machine memory used by each TT child process
Tasktracker.http.threads 50 The number of threads in the HTTP server that the TT uses to track task tasks
Mapred.task.tracker.http.address 0.0.0.0:50060 TT default listener HTTPIP and port, default can not write. Port write 0 is used randomly.
Mapred.output.compress true | False Task result with compressed output, default false, recommended false
Mapred.output.compression.codec Org.apache.hadoop.io.
Compress. Defaultcodec
The codec used to output the result can also be used in GZ or bzip2 or lzo or snappy, etc.
Mapred.compress.map.output true | False If the map output is output in a compressed format before network switching, the default is false, and it is recommended that true to reduce bandwidth consumption at a slower cost.
Mapred.map.output.compression.codec Com.hadoop.compression.
Lzo. Lzocodec
Codec used by the map stage compression output
Map.sort.class Org.apache.hadoop.util.
QuickSort
The algorithm used by the map output sort, the default fast-line.
Mapred.hosts Conf/mhost.allow List of TT servers allowed to connect to JT, null value all allowed
Mapred.hosts.exclude Conf/mhost.deny The TT list of JT is forbidden, and the node removal is very useful.
Mapred.queue.names Etl,rush,default List of queue names used with the scheduler, comma delimited
Mapred.tasktracker.map.
Tasks.maximum
12 The maximum number of map slots allowed to start per server.
Mapred.tasktracker.reduce.
Tasks.maximum
6 Maximum number of reduce slots allowed to start per server


Pick up some of the more important, with a lot of configuration, the official website suggested the expert configuration item basically did not write on, changed the bad is not fun.

This article is from the "Practice Test Truth" blog, declined reproduced.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.