Hadoop Common configuration Item "Go"

Source: Internet
Author: User
Tags http authentication throw exception time in milliseconds

Hadoop Common configuration Item "Go"Core-site.xml

Name Value Description
Fs.default.name hdfs://hadoopmaster:9000 Defines the URI and port of the Hadoopmaster
Fs.checkpoint.dir /opt/data/hadoop1/hdfs/namesecondary1 Define the path to the name backup of Hadoop, the official document says read this, write Dfs.name.dir
Fs.checkpoint.period 1800 Defines the backup interval for name backup, in seconds, only for SNN, and the default one hour
Fs.checkpoint.size 33554432 Backup interval at log size interval, only valid for SNN, default 64M
Io.compression.codecs

Org.apache.hadoop.io.compress.DefaultCodec,
Com.hadoop.compression.lzo.LzoCodec,
Com.hadoop.compression.lzo.LzopCodec,
Org.apache.hadoop.io.compress.GzipCodec,
Org.apache.hadoop.io.compress.BZip2Codec
(Layout adjustment, actual configuration do not enter)

The codecs used by Hadoop, Gzip and bzip2 are self-contained, Lzo need to be installed HADOOPGPL or kevinweil, comma separated, snappy also need to be installed separately
Io.compression.codec.lzo.class Com.hadoop.compression.lzo.LzoCodec Compression encoder used by the Lzo
Topology.script.file.name /hadoop/bin/rackaware.py Rack-Aware Scripting location
Topology.script.number.args 1000 The number of hosts that the rack-aware script manages, the IP address
Fs.trash.interval 10800 HDFs dumpster settings, can be recovered by mistake deletion, number of minutes, 0 is disabled, add this item without restarting Hadoop
Hadoop.http.filter.initializers

Org.apache.hadoop.security.
Authenticationfilterinitializer
(Layout adjustment, actual configuration do not enter)

Need Jobtracker,tasktracker
Namenode,datanode, such as HTTP access port user authentication use, need to configure all nodes

Hadoop.http.authentication.type Simple | Kerberos | #AUTHENTICATION_HANDLER_CLASSNAME # authentication method, default is simple, can also define class, need to configure all nodes
Hadoop.http.authentication.
Token.validity
(Layout adjustment, actual configuration do not enter)
36000 To validate the token's effective time, configure all nodes
Hadoop.http.authentication.
Signature.secret
(Layout adjustment, actual configuration do not enter)
Default do not write parameters Default does not write automatically generate private signatures when Hadoop starts, all nodes need to be configured
Hadoop.http.authentication.cookie.domain Domian.tld HTTP authentication is used by the domain name of the cookie, IP address access is not valid, you must configure the domain name for all nodes.
Hadoop.http.authentication.
Simple.anonymous.allowed
(Layout adjustment, actual configuration do not enter)
true | False Simple authentication private, default Allow anonymous access, true

Hadoop.http.authentication.
Kerberos.principal
(Layout adjustment, actual configuration do not enter)

Http/[email protected] $LOCALHOST Kerberos authentication dedicated, certified entity machines must use HTTP as the name of K
Hadoop.http.authentication.
Kerberos.keytab
(Layout adjustment, actual configuration do not enter)
/home/xianglei/hadoop.keytab Kerberos authentication private, key file storage location
Hadoop.security.authorization True|false Hadoop Service level verification security verification, with Hadoop-policy.xml, configured to use Dfsadmin,mradmin-refreshserviceacl refresh in effect
Io.file.buffer.size 131072 The size of the read-write buffer used as a serialized file processing
Hadoop.security.authentication Simple | Kerberos Permission validation for Hadoop itself, non-HTTP access, simple or Kerberos
Hadoop.logfile.size 1000000000 Set the log file size to scroll the new log more than
Hadoop.logfile.count 20 Maximum number of logs
Io.bytes.per.checksum 1024 The number of bytes verified per checksum, not greater than io.file.buffer.size
Io.skip.checksum.errors true | False Skipping checksum error while processing serialized file, do not throw exception. False by default
Io.serializations

Org.apache.hadoop.io.
Serializer. Writableserialization

(Layout required.) Actual configuration do not enter)

Serialized codecs
Io.seqfile.compress.blocksize 1024000 Minimum block size of the serialized file for block compression, bytes
Webinterface.private.actions true | False Set to True, the tracker Web page of JT and NN will appear to kill the task to delete files such as Operation connection, default is False

Hdfs

Name Value Description
Dfs.default.chunk.view.size 32768 The content display size for each file in the HTTP access page of Namenode, usually without setting.
Dfs.datanode.du.reserved 1073741824 The amount of space reserved for each disk needs to be set up, mainly for non-HDFS files, default is not reserved, 0 bytes
Dfs.name.dir /opt/data1/hdfs/name,
/opt/data2/hdfs/name,
/nfs/data/hdfs/name
The metadata used by NN is saved, and it is generally recommended to keep one copy on NFS, used as a 1.0 ha scenario, or on multiple drives on a single server
Dfs.web.ugi Nobody,nobody Users and groups used by the Web Tracker page server used by NN,JT, etc.
Dfs.permissions true | False DFS permissions are turned on, I generally set false, through the development of tools to train others interface operation to avoid misoperation, set to true sometimes you will encounter data because the permissions can not access.
Dfs.permissions.supergroup SuperGroup Set the HDFS Super privilege group, which is supergroup by default, and the user who started Hadoop is typically superuser.
Dfs.data.dir /opt/data1/hdfs/data,
/opt/data2/hdfs/data,
/opt/data3/hdfs/data,
...
True Datanode data save path, can write multiple hard disks, comma separated
Dfs.datanode.data.dir.perm 755 Path permission for local folder used by Datanode, default 755
Dfs.replication 3 The number of copies of the HDFS data block, by default 3, in theory the more the number of copies run faster, but more storage space is required. The rich can tune 5 or 6.
Dfs.replication.max 512 Sometimes the DNS temporary failure recovery causes the data to exceed the default backup count. The largest number of copies, usually of no use, is not written in the config file.
Dfs.replication.min 1 Minimum number of copies, function ibid.
Dfs.block.size 134217728 The size of each file block, we use 128M, the default is 64M. This calculation needs 128*1024^2, I have encountered someone directly write 128000000, very romantic.
Dfs.df.interval 60000 Disk usage Statistics automatic refresh time in milliseconds.
Dfs.client.block.write.retries 3 The maximum number of times the data block is written, which does not catch a failure before this number of times.
Dfs.heartbeat.interval 3 The heartbeat detection interval for the DN. Seconds
Dfs.namenode.handler.count 10 The number of threads that are expanded after nn startup.
Dfs.balance.bandwidthPerSec 1048576 Maximum bandwidth per second used when doing balance, using bytes as units instead of bit
Dfs.hosts /opt/hadoop/conf/hosts.allow A host Name list file, where the host is allowed to connect the NN, must write absolute path, the contents of the file is empty is considered to be all.
Dfs.hosts.exclude /opt/hadoop/conf/hosts.deny The rationale is the same, but here is a list of host names that are forbidden to access the NN. This is useful for removing the DN from the cluster.
Dfs.max.objects 0 The number of Dfs maximum concurrent objects, the files in HDFs, and the directory blocks are considered to be an object. 0 means no Limit
Dfs.replication.interval 3 NN computes the internal interval of the copied block, usually without writing to the configuration file. The default is good
Dfs.support.append true | False The new Hadoop supports the append operation of the file, which is to control whether the file is allowed to append, but the default is false, because there are additional bugs.
dfs.datanode.failed.volumes.tolerated 0 The maximum number of bad drives that can cause the DN to be hung, the default 0 is that the DN will be shutdown as long as 1 hard drives are broken.
Dfs.secondary.http.address 0.0.0.0:50090 SNN Tracker page listener address and port
Dfs.datanode.address 0.0.0.0:50010 DN of the service listening port, Port 0 will be random listening to the port, through the heartbeat to notify the NN
Dfs.datanode.http.address 0.0.0.0:50075 The tracker page of the DN listens to the address and port
Dfs.datanode.ipc.address 0.0.0.0:50020 The IPC listening port of DN, write 0 to listen at random port via heartbeat transmission to NN
Dfs.datanode.handler.count 3 The number of service threads started by DN
Dfs.http.address 0.0.0.0:50070 Tracker page listening address and port for nn
Dfs.https.enable true | False If the tracker of NN is listening on the HTTPS protocol, the default is False
Dfs.datanode.https.address 0.0.0.0:50475 HTTPS for DN Tracker page listening address and port
Dfs.https.address 0.0.0.0:50470 HTTP Tracker page listening address and port for nn
Dfs.datanode.max.xcievers 2048 Equivalent to the maximum number of open files under Linux, the document does not have this parameter, when the Dataxceiver error occurs, you need to increase the size. Default 256

Name Value Description
Hadoop.job.history.location Job history File Save path, no configurable parameters, and do not write in the configuration file, default in the Logs folder.
Hadoop.job.history.user.location User History File storage location
Io.sort.factor 30 Here we deal with the number of file sorts when the stream is merged, and I understand the number of files opened when sorting
Io.sort.mb 600 Sort the amount of memory used, unit trillion, default 1, I remember is not more than mapred.child.java.opt settings, otherwise will Oom
Mapred.job.tracker hadoopmaster:9001 Connection Jobtrack Server configuration items, default is not write Local,map number 1,reduce 1
Mapred.job.tracker.http.address 0.0.0.0:50030 Jobtracker's tracker page Service listener address
Mapred.job.tracker.handler.count 15 Jobtracker the number of threads serviced
Mapred.task.tracker.report.address 127.0.0.1:0 Tasktracker monitoring Server, no configuration, and the official does not recommend self-modification
Mapred.local.dir /data1/hdfs/mapred/local,
/data2/hdfs/mapred/local,
...
mapred do local calculations using folders, you can configure multiple hard disks, comma separated
Mapred.system.dir /data1/hdfs/mapred/system,
/data2/hdfs/mapred/system,
...
Mapred the folder used to store the control files, you can configure multiple hard disks, separated by commas.
Mapred.temp.dir /data1/hdfs/mapred/temp,
/data2/hdfs/mapred/temp,
...
Mapred the shared temporary folder path, as explained above.
Mapred.local.dir.minspacestart 1073741824 Local arithmetic folder The remaining space below this value is not calculated locally. Byte configuration, default 0
Mapred.local.dir.minspacekill 1073741824 The remaining space on the local compute folder is less than the value of the new task, bytes, default 0
Mapred.tasktracker.expiry.interval 60000 TT did not send a heartbeat at this time, then thought TT had been hung. Unit milliseconds
Mapred.map.tasks 2 The default number of maps used by each job, which means that if the DFS block size is set to 64M, a 60M file needs to be sorted, and 2 map threads will be opened, and the Jobtracker set to local is not working.
Mapred.reduce.tasks 1 Explanation Ibid.
Mapred.jobtracker.restart.recover true | False Turn on task recovery on restart, default false
Mapred.jobtracker.taskScheduler Org.apache.hadoop.mapred.
Capacitytaskscheduler

Org.apache.hadoop.mapred.
Jobqueuetaskscheduler

Org.apache.hadoop.mapred.
Fairscheduler
Important thing, open Task Manager, if not set, Hadoop default is FIFO scheduler, other can use fair and compute Power Scheduler
Mapred.reduce.parallel.copies 10 Reduce the number of parallel copies used in the shuffle phase, default 5
Mapred.child.java.opts

-xmx2048m

-djava.library.path=
/opt/hadoopgpl/native/
Linux-amd64-64

The size of the virtual machine memory used by each TT child process
Tasktracker.http.threads 50 The number of threads in the HTTP server that the TT uses to track task tasks
Mapred.task.tracker.http.address 0.0.0.0:50060 TT default listener HTTPIP and port, default can not write. Port write 0 is used randomly.
Mapred.output.compress true | False Task result with compressed output, default false, recommended false
Mapred.output.compression.codec Org.apache.hadoop.io.
Compress. Defaultcodec
The codec used to output the result can also be used in GZ or bzip2 or lzo or snappy, etc.
Mapred.compress.map.output true | False If the map output is output in a compressed format before network switching, the default is false, and it is recommended that true to reduce bandwidth consumption at a slower cost.
Mapred.map.output.compression.codec Com.hadoop.compression.
Lzo. Lzocodec
Codec used by the map stage compression output
Map.sort.class Org.apache.hadoop.util.
QuickSort
The algorithm used by the map output sort, the default fast-line.
Mapred.hosts Conf/mhost.allow List of TT servers allowed to connect to JT, null value all allowed
Mapred.hosts.exclude Conf/mhost.deny The TT list of JT is forbidden, and the node removal is very useful.
Mapred.queue.names Etl,rush,default List of queue names used with the scheduler, comma delimited
Mapred.tasktracker.map.
Tasks.maximum
12 The maximum number of map slots allowed to start per server.
Mapred.tasktracker.reduce.
Tasks.maximum
6 Maximum number of reduce slots allowed to start per server

Hadoop Common configuration Item "Go"

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.