Hadoop configuration meaning (continue to update)

Last Update:2018-12-07 Source: Internet

Author: User

Tags hadoop mapreduce

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Add By yourself:

DFS. datanode. du. reserved: indicates how much non-DFS disk space is retained when datanode writes data to the disk. This prevents DFS from filling up the disk, but this parameter has a bug in 0.19.2.

I introduced"IPC.Server.Listen.Queue.Size"Which defines how does callper Handler are allowed inQueue. The default is wtill 100. So there is no change for current users. When the RPC service is started, each hadler processes the maximum number of requests in the stack, and the client needs to wait.

DFS. datanode. simulateddatastorage https://issues.apache.org/jira/browse/HADOOP-1989; this node starts a pseudo Distributed System for debugging

Slave. Host. Name

: The name of each datanode node. Generally, each machine is configured with its own IP address, which is used to execute a specific datanode connection address in HDFS on the web page. On the web management page
In mapreduce, It is the address of the machine connected to a specific map (reduce) task. If not configured, It is the machine name.

DFS. datanode. Failed. volumes. tolerated
: Number of disk damages allowed by datanode
, Datanode will use the folder configured under DFS. Data. dir (used to store blocks) at startup. If some folder cannot be used and the number is greater than the one configured above
Value, which fails to start,CodeSee org. Apache. hadoop. HDFS. server. datanode. fsdataset code 980-997.
, As follows:

Final Int Volfailurestolerated = Conf. getint (  "  DFS. datanode. Failed. volumes. tolerated  "  , 0  ); String [] datadirs = Conf. getstrings (datanode. data_dir_key );  Int Volsconfigured = 0  ;  If (Datadirs! = Null  ) Volsconfigured = Datadirs. length;  Int Volsfailed = volsconfigured- Storage. getnumstoragedirs ();  If (Volsfailed < 0 | Volsfailed > Volfailurestolerated ){  Throw   New Diskerrorexception ( "  Invalid value for volsfailed:  " + Volsfailed + "  , Volumes tolerated:  " + Volfailurestolerated );}

DFS. blockreport. intervalmsec

Datanode periodically reports all block information on the current node to namenode. the DFS. blockreport. intervalmsec parameter controls the report interval.

DFS. blockreport. initialdelay

When used together with the previous parameter, the first time that datanode reports its block information after it is started is at (0, $ (DFS. blockreport. initialdelay), and then start from inittime (this must be different on different datanode) at intervals of DFS. blockreport. intervalmsec, The datanode will report information about all its blocks to the namenode.

If there is no inittime, many datanode will be sent from the starting moment, which will cause a large amount of data to be sent to NN and cause congestion. This parameter is used for control.

Some parameters that can be obtained during job running:

Mapred. Job. ID: Job ID, for example, job_201511121233_0001

Mapred. Tip. ID task id, for example, task_201121233_0001_m_000003

Mapred. task. ID: ID of the task attempt, for example, attempt_201%121233_000%m_000003_0

The sequence number of a task in mapred. task. Partition job, for example, 3.

Mapred. task. Is. Map whether the task is a map task, such as true

Mapred. Job. queue. name indicates the queue to which the task belongs. Generally, this attribute value is written in the configuration file for clients of different users.

DFS. Client. Max. Block. Acquire. Failures

When reading files on hadoop, dfsclient reads Specific block information from datanode. If the read node fails (the socket cannot be connected), the client will try multiple times, this is the number of set attempts. If the number of attempts exceeds this limit, an exception is thrown.

========================================================== ========================================================== ================

The following is reproduced from: http://blog.chinaunix.net/space.php? Uid = 22477743 & Do = Blog & cuid = 2046639; http://longmans1985.blog.163.com/blog/static/7060547520113652122555/

0. version 0.19.2 1. hadoop cluster: 1.1. HDFS 1.1.1 Name node (1 unit) 1.1.2 secondary Name node (1 unit, optional) 1.1.3 data node (several units) 1.2. mr 1.2.1 master [jobtracker] (1 unit) 1.2.2 slave [tasktracker] (several units) 2. configuration File 2.1 hadoop-default.xml hadoop cluster default configuration, usually do not need to modify this configuration file. the machine personalization profile in the 2.2 hadoop-site.xml hadoop cluster typically specifies the machine's personalization configuration here. 3. configuration item 3.1 FS. default. name definition: Name node URI Description: HDFS: // hostname/

3.2 mapred. Job. Tracker

Definition: jobtracker address

Description: Hostname: Port 3.3 DFS. name. dir definition: Name node local directory for saving metadata and transaction logs Description: A comma-separated directory list is used to specify redundant backup of multiple data. 3.4 DFS. data. dir definition: local directory of data node to save Block Files Description: comma-separated Directory List specifies these directories are used to save block files. 3.5 mapred. system. dir definition: directory where mapreduce saves system files on HDFS. description: 3.6 mapred. local. dir definition: local directory for saving mapreduce temporary files

Description: A comma-separated directory list is used to specify multiple directories as temporary data spaces at the same time.

3.7 mapred. tasktracker. {map | reduce }. tasks. maximum definition: Maximum number of MAP/reduce tasks that can be run simultaneously on tasktracker. description: The default number of MAP/reduce tasks is 2.

3.8 DFS. hosts/DFS. hosts. exclude definition: Data Node whitelist/blacklist file Description: 3.9 mapred. hosts/mapred. hosts. exclude definition: mapreduce whitelist/blacklist file Description: 3.10 mapred. queue. names definition: queue Name Description: hadoop mapreduce system has a "default" Job Queue (pool) by default ).

3.11 DFS. Block. Size
Definition: Default HDFS block size
Description: The default value is 128 MB.

3.12 DFS. namenode. handler. Count
Definition: Number of threads that namenode communicates with datanode at the same time
Description:

3.13 mapred. Reduce. Parallel. Copies
Definition: number of files simultaneously pulled from mapper by CER
Description:

3.14 mapred. Child. java. opts
Definition: the heap size of the Child JVM.
Description:

3.15 fs. inmemory. Size. MB
Definition: memory space used by CER to merge map output data
Description: 200 MB is used by default.

3.16 Io. Sort. Factor
Definition: Sorting factor. Number of data streams merged at the same time
Description:

3.17 Io. Sort. MB
Definition: maximum memory used for sorting
Description:

3.18 Io. file. Buffer. Size
Definition: buffer size of read/write files
Description:

3.19 mapred. Job. tracker. handler. Count
Definition: Number of threads that jobtracker communicates with tasktracker at the same time
Description:

3.20 tasktracker. http. threads
Definition: Number of threads that tasktracker enables HTTP Services. Reduce is used to pull map output data.
Description:

The red configuration is required.

Parameters	Value	Remarks
FS. Default. Name	Namenode.	HDFS: // host name/
DFS. hosts/dfs. Hosts. Exclude	List of allowed/denied datanode.	Use this file to control the licensed datanode list if necessary.
DFS. Replication	Default Value: 3	Data Replication score
DFS. Name. dir	Example:/Home/username/hadoop/namenode Default Value:/Tmp	When this value is a comma-separated directory list, the nametable data will be copied to all directories for redundant backup.
DFS. Data. dir	Example:/home/username/hadoop/datanode Default Value:/tmp	When this value is a comma-separated directory list, data is stored in all directories and usually distributed across different devices.

Mapred. system. dir	HDFS path of the MAP/reduce framework storage system file. For example/Hadoop/mapred/system/.	This path is the path under the default file system (HDFS) and must be accessible from both the server and client.
Mapred. Local. dir	List of comma-separated paths in the local file system, where MAP/reduce temporary data is stored.	Multi-path facilitates the use of disk I/O.
Mapred. tasktracker. {map \| reduce}. Tasks. Maximum	ATasktrackerThe maximum number of MAP/reduce tasks that can run simultaneously.	The default value is 2 (2 maps and 2 reduce), which can be changed based on hardware conditions.
Mapred. Job. Tracker	JobtrackerHost (or IP) and port.	HOST: Port.
Mapred. hosts/mapred. Hosts. Exclude	Permit/deny tasktracker list.	Use this file to control the authorized tasktracker list if necessary.

Hadoop. Job. History. User. Location	Default Value: mapred. Output. DIR/_ logs/History You can also set it to none to disable it.	Job history file directory

Conf/slaves write the name or IP address of all slave machines

Namenode remembers the blockid mapped to each file. The block corresponding to each blockid is copied to a different machine for additional parts.

The default hadoop block is 64 MB.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More