Hadoop Common configuration Item "Go"

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hadoop Common configuration Item "Go"Core-site.xml

Name	Value	Description
Fs.default.name	hdfs://hadoopmaster:9000	Defines the URI and port of the Hadoopmaster
Fs.checkpoint.dir	/opt/data/hadoop1/hdfs/namesecondary1	Define the path to the name backup of Hadoop, the official document says read this, write Dfs.name.dir
Fs.checkpoint.period	1800	Defines the backup interval for name backup, in seconds, only for SNN, and the default one hour
Fs.checkpoint.size	33554432	Backup interval at log size interval, only valid for SNN, default 64M
Io.compression.codecs	Org.apache.hadoop.io.compress.DefaultCodec, Com.hadoop.compression.lzo.LzoCodec, Com.hadoop.compression.lzo.LzopCodec, Org.apache.hadoop.io.compress.GzipCodec, Org.apache.hadoop.io.compress.BZip2Codec (Layout adjustment, actual configuration do not enter)	The codecs used by Hadoop, Gzip and bzip2 are self-contained, Lzo need to be installed HADOOPGPL or kevinweil, comma separated, snappy also need to be installed separately
Io.compression.codec.lzo.class	Com.hadoop.compression.lzo.LzoCodec	Compression encoder used by the Lzo
Topology.script.file.name	/hadoop/bin/rackaware.py	Rack-Aware Scripting location
Topology.script.number.args	1000	The number of hosts that the rack-aware script manages, the IP address
Fs.trash.interval	10800	HDFs dumpster settings, can be recovered by mistake deletion, number of minutes, 0 is disabled, add this item without restarting Hadoop
Hadoop.http.filter.initializers	Org.apache.hadoop.security. Authenticationfilterinitializer (Layout adjustment, actual configuration do not enter)	Need Jobtracker,tasktracker Namenode,datanode, such as HTTP access port user authentication use, need to configure all nodes
Hadoop.http.authentication.type	Simple \| Kerberos \| #AUTHENTICATION_HANDLER_CLASSNAME #	authentication method, default is simple, can also define class, need to configure all nodes
Hadoop.http.authentication. Token.validity (Layout adjustment, actual configuration do not enter)	36000	To validate the token's effective time, configure all nodes
Hadoop.http.authentication. Signature.secret (Layout adjustment, actual configuration do not enter)	Default do not write parameters	Default does not write automatically generate private signatures when Hadoop starts, all nodes need to be configured
Hadoop.http.authentication.cookie.domain	Domian.tld	HTTP authentication is used by the domain name of the cookie, IP address access is not valid, you must configure the domain name for all nodes.
Hadoop.http.authentication. Simple.anonymous.allowed (Layout adjustment, actual configuration do not enter)	true \| False	Simple authentication private, default Allow anonymous access, true
Hadoop.http.authentication. Kerberos.principal (Layout adjustment, actual configuration do not enter)	Http/[email protected] $LOCALHOST	Kerberos authentication dedicated, certified entity machines must use HTTP as the name of K
Hadoop.http.authentication. Kerberos.keytab (Layout adjustment, actual configuration do not enter)	/home/xianglei/hadoop.keytab	Kerberos authentication private, key file storage location
Hadoop.security.authorization	True\|false	Hadoop Service level verification security verification, with Hadoop-policy.xml, configured to use Dfsadmin,mradmin-refreshserviceacl refresh in effect
Io.file.buffer.size	131072	The size of the read-write buffer used as a serialized file processing
Hadoop.security.authentication	Simple \| Kerberos	Permission validation for Hadoop itself, non-HTTP access, simple or Kerberos
Hadoop.logfile.size	1000000000	Set the log file size to scroll the new log more than
Hadoop.logfile.count	20	Maximum number of logs
Io.bytes.per.checksum	1024	The number of bytes verified per checksum, not greater than io.file.buffer.size
Io.skip.checksum.errors	true \| False	Skipping checksum error while processing serialized file, do not throw exception. False by default
Io.serializations	Org.apache.hadoop.io. Serializer. Writableserialization (Layout required.) Actual configuration do not enter)	Serialized codecs
Io.seqfile.compress.blocksize	1024000	Minimum block size of the serialized file for block compression, bytes
Webinterface.private.actions	true \| False	Set to True, the tracker Web page of JT and NN will appear to kill the task to delete files such as Operation connection, default is False

Hdfs

Name	Value	Description
Dfs.default.chunk.view.size	32768	The content display size for each file in the HTTP access page of Namenode, usually without setting.
Dfs.datanode.du.reserved	1073741824	The amount of space reserved for each disk needs to be set up, mainly for non-HDFS files, default is not reserved, 0 bytes
Dfs.name.dir	/opt/data1/hdfs/name, /opt/data2/hdfs/name, /nfs/data/hdfs/name	The metadata used by NN is saved, and it is generally recommended to keep one copy on NFS, used as a 1.0 ha scenario, or on multiple drives on a single server
Dfs.web.ugi	Nobody,nobody	Users and groups used by the Web Tracker page server used by NN,JT, etc.
Dfs.permissions	true \| False	DFS permissions are turned on, I generally set false, through the development of tools to train others interface operation to avoid misoperation, set to true sometimes you will encounter data because the permissions can not access.
Dfs.permissions.supergroup	SuperGroup	Set the HDFS Super privilege group, which is supergroup by default, and the user who started Hadoop is typically superuser.
Dfs.data.dir	/opt/data1/hdfs/data, /opt/data2/hdfs/data, /opt/data3/hdfs/data, ...	True Datanode data save path, can write multiple hard disks, comma separated
Dfs.datanode.data.dir.perm	755	Path permission for local folder used by Datanode, default 755
Dfs.replication	3	The number of copies of the HDFS data block, by default 3, in theory the more the number of copies run faster, but more storage space is required. The rich can tune 5 or 6.
Dfs.replication.max	512	Sometimes the DNS temporary failure recovery causes the data to exceed the default backup count. The largest number of copies, usually of no use, is not written in the config file.
Dfs.replication.min	1	Minimum number of copies, function ibid.
Dfs.block.size	134217728	The size of each file block, we use 128M, the default is 64M. This calculation needs 128*1024^2, I have encountered someone directly write 128000000, very romantic.
Dfs.df.interval	60000	Disk usage Statistics automatic refresh time in milliseconds.
Dfs.client.block.write.retries	3	The maximum number of times the data block is written, which does not catch a failure before this number of times.
Dfs.heartbeat.interval	3	The heartbeat detection interval for the DN. Seconds
Dfs.namenode.handler.count	10	The number of threads that are expanded after nn startup.
Dfs.balance.bandwidthPerSec	1048576	Maximum bandwidth per second used when doing balance, using bytes as units instead of bit
Dfs.hosts	/opt/hadoop/conf/hosts.allow	A host Name list file, where the host is allowed to connect the NN, must write absolute path, the contents of the file is empty is considered to be all.
Dfs.hosts.exclude	/opt/hadoop/conf/hosts.deny	The rationale is the same, but here is a list of host names that are forbidden to access the NN. This is useful for removing the DN from the cluster.
Dfs.max.objects	0	The number of Dfs maximum concurrent objects, the files in HDFs, and the directory blocks are considered to be an object. 0 means no Limit
Dfs.replication.interval	3	NN computes the internal interval of the copied block, usually without writing to the configuration file. The default is good
Dfs.support.append	true \| False	The new Hadoop supports the append operation of the file, which is to control whether the file is allowed to append, but the default is false, because there are additional bugs.
dfs.datanode.failed.volumes.tolerated	0	The maximum number of bad drives that can cause the DN to be hung, the default 0 is that the DN will be shutdown as long as 1 hard drives are broken.
Dfs.secondary.http.address	0.0.0.0:50090	SNN Tracker page listener address and port
Dfs.datanode.address	0.0.0.0:50010	DN of the service listening port, Port 0 will be random listening to the port, through the heartbeat to notify the NN
Dfs.datanode.http.address	0.0.0.0:50075	The tracker page of the DN listens to the address and port
Dfs.datanode.ipc.address	0.0.0.0:50020	The IPC listening port of DN, write 0 to listen at random port via heartbeat transmission to NN
Dfs.datanode.handler.count	3	The number of service threads started by DN
Dfs.http.address	0.0.0.0:50070	Tracker page listening address and port for nn
Dfs.https.enable	true \| False	If the tracker of NN is listening on the HTTPS protocol, the default is False
Dfs.datanode.https.address	0.0.0.0:50475	HTTPS for DN Tracker page listening address and port
Dfs.https.address	0.0.0.0:50470	HTTP Tracker page listening address and port for nn
Dfs.datanode.max.xcievers	2048	Equivalent to the maximum number of open files under Linux, the document does not have this parameter, when the Dataxceiver error occurs, you need to increase the size. Default 256

Name	Value	Description
Hadoop.job.history.location		Job history File Save path, no configurable parameters, and do not write in the configuration file, default in the Logs folder.
Hadoop.job.history.user.location		User History File storage location
Io.sort.factor	30	Here we deal with the number of file sorts when the stream is merged, and I understand the number of files opened when sorting
Io.sort.mb	600	Sort the amount of memory used, unit trillion, default 1, I remember is not more than mapred.child.java.opt settings, otherwise will Oom
Mapred.job.tracker	hadoopmaster:9001	Connection Jobtrack Server configuration items, default is not write Local,map number 1,reduce 1
Mapred.job.tracker.http.address	0.0.0.0:50030	Jobtracker's tracker page Service listener address
Mapred.job.tracker.handler.count	15	Jobtracker the number of threads serviced
Mapred.task.tracker.report.address	127.0.0.1:0	Tasktracker monitoring Server, no configuration, and the official does not recommend self-modification
Mapred.local.dir	/data1/hdfs/mapred/local, /data2/hdfs/mapred/local, ...	mapred do local calculations using folders, you can configure multiple hard disks, comma separated
Mapred.system.dir	/data1/hdfs/mapred/system, /data2/hdfs/mapred/system, ...	Mapred the folder used to store the control files, you can configure multiple hard disks, separated by commas.
Mapred.temp.dir	/data1/hdfs/mapred/temp, /data2/hdfs/mapred/temp, ...	Mapred the shared temporary folder path, as explained above.
Mapred.local.dir.minspacestart	1073741824	Local arithmetic folder The remaining space below this value is not calculated locally. Byte configuration, default 0
Mapred.local.dir.minspacekill	1073741824	The remaining space on the local compute folder is less than the value of the new task, bytes, default 0
Mapred.tasktracker.expiry.interval	60000	TT did not send a heartbeat at this time, then thought TT had been hung. Unit milliseconds
Mapred.map.tasks	2	The default number of maps used by each job, which means that if the DFS block size is set to 64M, a 60M file needs to be sorted, and 2 map threads will be opened, and the Jobtracker set to local is not working.
Mapred.reduce.tasks	1	Explanation Ibid.
Mapred.jobtracker.restart.recover	true \| False	Turn on task recovery on restart, default false
Mapred.jobtracker.taskScheduler	Org.apache.hadoop.mapred. Capacitytaskscheduler Org.apache.hadoop.mapred. Jobqueuetaskscheduler Org.apache.hadoop.mapred. Fairscheduler	Important thing, open Task Manager, if not set, Hadoop default is FIFO scheduler, other can use fair and compute Power Scheduler
Mapred.reduce.parallel.copies	10	Reduce the number of parallel copies used in the shuffle phase, default 5
Mapred.child.java.opts	-xmx2048m -djava.library.path= /opt/hadoopgpl/native/ Linux-amd64-64	The size of the virtual machine memory used by each TT child process
Tasktracker.http.threads	50	The number of threads in the HTTP server that the TT uses to track task tasks
Mapred.task.tracker.http.address	0.0.0.0:50060	TT default listener HTTPIP and port, default can not write. Port write 0 is used randomly.
Mapred.output.compress	true \| False	Task result with compressed output, default false, recommended false
Mapred.output.compression.codec	Org.apache.hadoop.io. Compress. Defaultcodec	The codec used to output the result can also be used in GZ or bzip2 or lzo or snappy, etc.
Mapred.compress.map.output	true \| False	If the map output is output in a compressed format before network switching, the default is false, and it is recommended that true to reduce bandwidth consumption at a slower cost.
Mapred.map.output.compression.codec	Com.hadoop.compression. Lzo. Lzocodec	Codec used by the map stage compression output
Map.sort.class	Org.apache.hadoop.util. QuickSort	The algorithm used by the map output sort, the default fast-line.
Mapred.hosts	Conf/mhost.allow	List of TT servers allowed to connect to JT, null value all allowed
Mapred.hosts.exclude	Conf/mhost.deny	The TT list of JT is forbidden, and the node removal is very useful.
Mapred.queue.names	Etl,rush,default	List of queue names used with the scheduler, comma delimited
Mapred.tasktracker.map. Tasks.maximum	12	The maximum number of map slots allowed to start per server.
Mapred.tasktracker.reduce. Tasks.maximum	6	Maximum number of reduce slots allowed to start per server

Hadoop Common configuration Item "Go"

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More