Add By yourself:
DFS. datanode. du. reserved: indicates how much non-DFS disk space is retained when datanode writes data to the disk. This prevents DFS from filling up the disk, but this parameter has a bug in 0.19.2.
I introduced"IPC.Server.Listen.Queue.Size"Which defines how does callper Handler are allowed inQueue. The default is wtill 100. So there is no change for current users. When the RPC service is started, each hadler processes the maximum number of requests in the stack, and the client needs to wait.
DFS. datanode. simulateddatastorage https://issues.apache.org/jira/browse/HADOOP-1989; this node starts a pseudo Distributed System for debugging
Slave. Host. Name
: The name of each datanode node. Generally, each machine is configured with its own IP address, which is used to execute a specific datanode connection address in HDFS on the web page. On the web management page
In mapreduce, It is the address of the machine connected to a specific map (reduce) task. If not configured, It is the machine name.
DFS. datanode. Failed. volumes. tolerated
: Number of disk damages allowed by datanode
, Datanode will use the folder configured under DFS. Data. dir (used to store blocks) at startup. If some folder cannot be used and the number is greater than the one configured above
Value, which fails to start,CodeSee org. Apache. hadoop. HDFS. server. datanode. fsdataset code 980-997.
, As follows:
Final Int Volfailurestolerated = Conf. getint ( " DFS. datanode. Failed. volumes. tolerated " , 0 ); String [] datadirs = Conf. getstrings (datanode. data_dir_key ); Int Volsconfigured = 0 ; If (Datadirs! = Null ) Volsconfigured = Datadirs. length; Int Volsfailed = volsconfigured- Storage. getnumstoragedirs (); If (Volsfailed < 0 | Volsfailed > Volfailurestolerated ){ Throw New Diskerrorexception ( " Invalid value for volsfailed: " + Volsfailed + " , Volumes tolerated: " + Volfailurestolerated );}
DFS. blockreport. intervalmsec
Datanode periodically reports all block information on the current node to namenode. the DFS. blockreport. intervalmsec parameter controls the report interval.
DFS. blockreport. initialdelay
When used together with the previous parameter, the first time that datanode reports its block information after it is started is at (0, $ (DFS. blockreport. initialdelay), and then start from inittime (this must be different on different datanode) at intervals of DFS. blockreport. intervalmsec, The datanode will report information about all its blocks to the namenode.
If there is no inittime, many datanode will be sent from the starting moment, which will cause a large amount of data to be sent to NN and cause congestion. This parameter is used for control.
Some parameters that can be obtained during job running:
Mapred. Job. ID: Job ID, for example, job_201511121233_0001
Mapred. Tip. ID task id, for example, task_201121233_0001_m_000003
Mapred. task. ID: ID of the task attempt, for example, attempt_201%121233_000%m_000003_0
The sequence number of a task in mapred. task. Partition job, for example, 3.
Mapred. task. Is. Map whether the task is a map task, such as true
Mapred. Job. queue. name indicates the queue to which the task belongs. Generally, this attribute value is written in the configuration file for clients of different users.
DFS. Client. Max. Block. Acquire. Failures
When reading files on hadoop, dfsclient reads Specific block information from datanode. If the read node fails (the socket cannot be connected), the client will try multiple times, this is the number of set attempts. If the number of attempts exceeds this limit, an exception is thrown.
========================================================== ========================================================== ================
The following is reproduced from: http://blog.chinaunix.net/space.php? Uid = 22477743 & Do = Blog & cuid = 2046639; http://longmans1985.blog.163.com/blog/static/7060547520113652122555/
0. version 0.19.2 1. hadoop cluster: 1.1. HDFS 1.1.1 Name node (1 unit) 1.1.2 secondary Name node (1 unit, optional) 1.1.3 data node (several units) 1.2. mr 1.2.1 master [jobtracker] (1 unit) 1.2.2 slave [tasktracker] (several units) 2. configuration File 2.1 hadoop-default.xml hadoop cluster default configuration, usually do not need to modify this configuration file. the machine personalization profile in the 2.2 hadoop-site.xml hadoop cluster typically specifies the machine's personalization configuration here. 3. configuration item 3.1 FS. default. name definition: Name node URI Description: HDFS: // hostname/
3.2 mapred. Job. Tracker
Definition: jobtracker address
Description: Hostname: Port 3.3 DFS. name. dir definition: Name node local directory for saving metadata and transaction logs Description: A comma-separated directory list is used to specify redundant backup of multiple data. 3.4 DFS. data. dir definition: local directory of data node to save Block Files Description: comma-separated Directory List specifies these directories are used to save block files. 3.5 mapred. system. dir definition: directory where mapreduce saves system files on HDFS. description: 3.6 mapred. local. dir definition: local directory for saving mapreduce temporary files
Description: A comma-separated directory list is used to specify multiple directories as temporary data spaces at the same time.
3.7 mapred. tasktracker. {map | reduce }. tasks. maximum definition: Maximum number of MAP/reduce tasks that can be run simultaneously on tasktracker. description: The default number of MAP/reduce tasks is 2.
3.8 DFS. hosts/DFS. hosts. exclude definition: Data Node whitelist/blacklist file Description: 3.9 mapred. hosts/mapred. hosts. exclude definition: mapreduce whitelist/blacklist file Description: 3.10 mapred. queue. names definition: queue Name Description: hadoop mapreduce system has a "default" Job Queue (pool) by default ).
3.11 DFS. Block. Size
Definition: Default HDFS block size
Description: The default value is 128 MB.
3.12 DFS. namenode. handler. Count
Definition: Number of threads that namenode communicates with datanode at the same time
Description:
3.13 mapred. Reduce. Parallel. Copies
Definition: number of files simultaneously pulled from mapper by CER
Description:
3.14 mapred. Child. java. opts
Definition: the heap size of the Child JVM.
Description:
3.15 fs. inmemory. Size. MB
Definition: memory space used by CER to merge map output data
Description: 200 MB is used by default.
3.16 Io. Sort. Factor
Definition: Sorting factor. Number of data streams merged at the same time
Description:
3.17 Io. Sort. MB
Definition: maximum memory used for sorting
Description:
3.18 Io. file. Buffer. Size
Definition: buffer size of read/write files
Description:
3.19 mapred. Job. tracker. handler. Count
Definition: Number of threads that jobtracker communicates with tasktracker at the same time
Description:
3.20 tasktracker. http. threads
Definition: Number of threads that tasktracker enables HTTP Services. Reduce is used to pull map output data.
Description:
The red configuration is required.
Parameters |
Value |
Remarks |
FS. Default. Name |
Namenode. |
HDFS: // host name/ |
DFS. hosts/dfs. Hosts. Exclude |
List of allowed/denied datanode. |
Use this file to control the licensed datanode list if necessary. |
DFS. Replication |
Default Value: 3 |
Data Replication score |
DFS. Name. dir |
Example:/Home/username/hadoop/namenode Default Value:/Tmp |
When this value is a comma-separated directory list, the nametable data will be copied to all directories for redundant backup. |
DFS. Data. dir |
Example:/home/username/hadoop/datanode Default Value:/tmp |
When this value is a comma-separated directory list, data is stored in all directories and usually distributed across different devices. |
|
|
|
Mapred. system. dir |
HDFS path of the MAP/reduce framework storage system file. For example/Hadoop/mapred/system/. |
This path is the path under the default file system (HDFS) and must be accessible from both the server and client. |
Mapred. Local. dir |
List of comma-separated paths in the local file system, where MAP/reduce temporary data is stored. |
Multi-path facilitates the use of disk I/O. |
Mapred. tasktracker. {map | reduce}. Tasks. Maximum |
ATasktrackerThe maximum number of MAP/reduce tasks that can run simultaneously. |
The default value is 2 (2 maps and 2 reduce), which can be changed based on hardware conditions. |
Mapred. Job. Tracker |
JobtrackerHost (or IP) and port. |
HOST: Port. |
Mapred. hosts/mapred. Hosts. Exclude |
Permit/deny tasktracker list. |
Use this file to control the authorized tasktracker list if necessary. |
|
|
|
Hadoop. Job. History. User. Location |
Default Value: mapred. Output. DIR/_ logs/History You can also set it to none to disable it. |
Job history file directory |
Conf/slaves write the name or IP address of all slave machines
Namenode remembers the blockid mapped to each file. The block corresponding to each blockid is copied to a different machine for additional parts.
The default hadoop block is 64 MB.