Hadoop configuration item organization (mapred-site.xml)

Source: Internet
Author: User

 

Name Value Description
Hadoop. Job. History. Location   The path of the historical job file, which has no configuration parameters and does not need to be written in the configuration file. It is in the History folder of logs by default.
Hadoop. Job. History. User. Location   Location of historical User Files
Io. Sort. Factor 30 The number of files to be sorted when stream merging is processed. I think it is the number of files to be opened during sorting.
Io. Sort. MB 600 Memory Used for sorting, in MB. The default value is 1. I remember that it cannot exceed the mapred. Child. java. Opt setting; otherwise, oom
Mapred. Job. Tracker Hadoopmaster: 9001 The configuration item connecting to the jobtrack server. The default value is "local", "map number 1", and "reduce number 1 ".
Mapred. Job. tracker. http. Address 0.0.0.0: 50030 Service Listening address of the tracker page of jobtracker
Mapred. Job. tracker. handler. Count 15 Jobtracker service thread count
Mapred. task. tracker. Report. Address 127.0.0.1: 0 Tasktracker listens to the server, which does not need to be configured and is not recommended by the official team.
Mapred. Local. dir /Data1/HDFS/mapred/local,
/Data2/HDFS/mapred/local,
...
The folder used by mapred for local calculation. Multiple hard disks can be configured and separated by commas (,).
Mapred. system. dir /Data1/HDFS/mapred/system,
/Data2/HDFS/mapred/system,
...
The folder used by mapred to store control files. Multiple hard disks can be configured and separated by commas.
Mapred. Temp. dir /Data1/HDFS/mapred/temp,
/Data2/HDFS/mapred/temp,
...
The path of the Temporary Folder shared by mapred is described as above.
Mapred. Local. dir. minspacestart 1073741824 If the remaining space in the local operation folder is lower than this value, it is not calculated locally. Byte configuration. The default value is 0.
Mapred. Local. dir. minspacekill 1073741824 If the remaining space in the local computing folder is lower than this value, no new task is applied. The number of bytes. The default value is 0.
Mapred. tasktracker. expiry. Interval 60000 If TT does not send a heartbeat packet within this time period, it is deemed that TT has crashed. Unit: milliseconds
Mapred. Map. Tasks 2 By default, the number of maps used by each job indicates that if the DFS block size is set to 64 MB and a 60 MB file needs to be sorted, two map threads will be enabled, it does not work when jobtracker is set to local.
Mapred. Reduce. Tasks 1 Same as above
Mapred. jobtracker. Restart. Recover True | false Enable task recovery upon restart. The default value is false.
Mapred. jobtracker. taskscheduler Org. Apache. hadoop. mapred.
Capacitytaskscheduler

Org. Apache. hadoop. mapred.
Jobqueuetaskscheduler

Org. Apache. hadoop. mapred.
Fairscheduler
The important thing is to enable the task manager. If you do not set it, hadoop uses the FIFO scheduler by default. Other schedulers can use the fair and computing power scheduler.
Mapred. Reduce. Parallel. Copies 10 Number of parallel copies used by reduce in the shuffle stage. The default value is 5.
Mapred. Child. java. opts

-Xmx2048m

-Djava. Library. Path =
/Opt/hadoopgpl/native/
Linux-amd64-64

Memory size of the virtual machine used by each TT sub-process
Tasktracker. http. threads 50 Number of HTTP server threads used by TT to track task tasks
Mapred. task. tracker. http. Address 0.0.0.0: 50060 The http ip address and port that TT listens to by default. It can be left empty by default. If the port is set to 0, it is randomly used.
Mapred. Output. Compress True | false The task results are compressed and output. The default value is false. False is recommended.
Mapred. Output. Compression. Codec Org. Apache. hadoop. Io.
Compress. defaultcodec
The decoder used for output results. You can also use GZ, Bzip2, lzo, snappy, etc.
Mapred. Compress. Map. Output True | false Whether the map output results are output in the compressed format before network switching. The default value is false. We recommend that you set this parameter to true to reduce the bandwidth usage and reduce the cost.
Mapred. Map. Output. Compression. Codec Com. hadoop. compression.
Lzo. lzocodec
The codecs used for compressing the output in the map stage
Map. Sort. Class Org. Apache. hadoop. util.
Quicksort
The algorithm used for map output sorting. The default value is "Quick Sort.
Mapred. Hosts Conf/mhost. Allow List of TT servers allowed to connect to JT. All null values are allowed.
Mapred. Hosts. Exclude Conf/mhost. Deny It is helpful to disable connection to the TT list of JT.
Mapred. queue. Names ETL, Rush, default List of queue names used with the scheduler, separated by commas
Mapred. tasktracker. Map.
Tasks. Maximum
12 Maximum number of map slots per server.
Mapred. tasktracker. reduce.
Tasks. Maximum
6 Maximum number of reduce slots per server

 

Some of the most important configurations are used. The expert configuration items recommended on the official website are basically not written, and it is not fun to modify them.

Hadoop configuration item organization (mapred-site.xml)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.