Common hadoop configurations

Source: Internet
Author: User
Hadoop common configuration items -- reprinted: core-site.xml
Name Value Description
FS. Default. Name HDFS: // hadoopmaster: 9000 Define the URI and port of hadoopmaster
FS. Checkpoint. dir /Opt/data/hadoop1/HDFS/namesecondary1 Defines the path for hadoop name backup. The official document says it is read and written to DFS. Name. dir.
FS. Checkpoint. Period 1800 Defines the backup interval of name backup, in seconds. It takes effect only for SNN. The default value is one hour.
FS. Checkpoint. Size 33554432 Backup interval by log size interval, effective only for SNN, default 64 m
Io. Compression. codecs

Org. Apache. hadoop. Io. Compress. defaultcodec,
Com. hadoop. Compression. lzo. lzocodec,
Com. hadoop. Compression. lzo. lzopcodec,
Org. Apache. hadoop. Io. Compress. gzipcodec,
Org. Apache. hadoop. Io. Compress. bzip2codec
(For typographical adjustment, do not press ENTER for actual configuration)

The codecs used by hadoop. gzip and Bzip2 are built-in. The lzo must be installed with hadoopgpl or kevinweil, separated by commas (,), and snappy must also be installed separately.
Io. Compression. codec. lzo. Class Com. hadoop. Compression. lzo. lzocodec Compression encoder used by lzo
Topology. Script. file. Name /Hadoop/bin/rackaware. py Rack-aware script location
Topology. Script. Number. ARGs 1000 Rack-aware script-managed hosts, IP addresses
FS. Trash. Interval 10800 HDFS garbage bin settings, can be restored to accidental deletion, in minutes, 0 is disabled, add this item without restarting hadoop
Hadoop. http. Filter. initializers

Org. Apache. hadoop. Security.
Authenticationfilterinitializer
(For typographical adjustment, do not press ENTER for actual configuration)

Jobtracker and tasktracker are required.
Users of HTTP access ports such as namenode and datanode must configure all nodes.

Hadoop. http. Authentication. Type Simple | Kerberos | # authentication_handler_classname # Verification method. The default value is simple. You can also define your own class. You need to configure all nodes.
Hadoop. http. Authentication.
Token. Validity
(For typographical adjustment, do not press ENTER for actual configuration)
36000 The validity period of the token. You must configure all nodes.
Hadoop. http. Authentication.
Signature. Secret
(For typographical adjustment, do not press ENTER for actual configuration)
Parameter unspecified by default By default, private signatures are automatically generated when hadoop is started. You need to configure all nodes.
Hadoop. http. Authentication. Cookie. Domain Domian. TLD The domain name of the cookie used for HTTP verification. If the IP address is accessed, this item is invalid. You must configure a domain name for all nodes.
Hadoop. http. Authentication.
Simple. Anonymous. Allowed
(For typographical adjustment, do not press ENTER for actual configuration)
True | false Dedicated for simple verification. Anonymous access is allowed by default. True

Hadoop. http. Authentication.
Kerberos. Principal
(For typographical adjustment, do not press ENTER for actual configuration)

HTTP/[email protected] $ localhost Dedicated for Kerberos authentication. The authenticated entity must use HTTP as the K name
Hadoop. http. Authentication.
Kerberos. keytab
(For typographical adjustment, do not press ENTER for actual configuration)
/Home/xianglei/hadoop. keytab Dedicated for Kerberos authentication, key file storage location
Hadoop. Security. Authorization True | false Hadoop service level authentication security verification, must be used with the hadoop-policy.xml, configured with dfsadmin, mradmin-refreshserviceacl refresh effective
Io. file. Buffer. Size 131072 Used as the buffer size for reading and writing serialized files
Hadoop. Security. Authentication Simple | Kerberos Permission verification for hadoop itself, non-HTTP access, simple or Kerberos
Hadoop. logfile. Size 1000000000 Set the log file size. If the size exceeds the limit, the new log is rolled.
Hadoop. logfile. Count 20 Maximum number of logs
Io. bytes. Per. checksum 1024 The number of bytes verified by each verification code. Do not exceed Io. file. Buffer. Size.
Io. Skip. checksum. Errors True | false When processing serialized files, skip the verification code error without throwing an exception. Default Value: false.
Io. serializations

Org. Apache. hadoop. Io.
Serializer. writableserialization

(Typographical needs. Do not press ENTER for actual configuration)

Serialized codecs
Io. seqfile. Compress. blocksize 1024000 The minimum block size, in bytes, of the serialized file.
Webinterface. Private. Actions True | false If this parameter is set to true, operations such as killing tasks and deleting files will be connected to the tracker page of JT and nn. The default value is false.

HDFS

 

Name Value Description
DFS. Default. Chunk. View. Size 32768 The size of each file displayed on the HTTP access page of namenode usually does not need to be set.
DFS. datanode. Du. Reserved 1073741824 The size of the space reserved by each disk, which must be set to be used mainly for non-HDFS files. The default value is not reserved, and the value is 0 bytes.
DFS. Name. dir /Opt/data1/HDFS/Name,
/Opt/data2/HDFS/Name,
/Nfs/data/HDFS/Name
We recommend that you retain one copy of the metadata used by NN on NFS as the 1.0 ha solution. You can also use it on multiple hard disks of a server.
DFS. Web. ugi Nobody, nobody Users and groups used by Web Tracker page servers such as NN and JT
DFS. Permissions True | false Whether to enable the DFS permission. Generally, I set false to train others on the interface through development tools to avoid misoperation. If I set it to true, sometimes the data cannot be accessed because of the permission.
DFS. permissions. supergroup Supergroup The default HDFS super permission group is supergroup. the user who starts hadoop is usually superuser.
DFS. Data. dir /Opt/data1/HDFS/data,
/Opt/data2/HDFS/data,
/Opt/data3/HDFS/data,
...
Real datanode data storage path. Multiple hard disks can be written and separated by commas (,).
DFS. datanode. Data. dir. perm 755 The path permission of the local folder used by datanode. The default value is 755.
DFS. Replication 3 The number of copies of HDFS data blocks. The default value is 3. Theoretically, more copies run faster, but more storage space is required. Rich people can call 5 or 6
DFS. Replication. Max 512 Sometimes, after a temporary DNS fault is restored, the data exceeds the default number of backups. The maximum number of copies is usually useless and does not need to be written into the configuration file.
DFS. Replication. Min 1 The minimum number of copies.
DFS. Block. Size 134217728 The size of each file block, which is 128 MB by default. This calculation requires 128*1024 ^ 2. I have met someone who writes 128000000 directly, which is very romantic.
DFS. df. Interval 60000 Disk usage statistics automatic refresh time, in milliseconds.
DFS. Client. Block. Write. retries 3 The maximum number of retries allowed for writing data blocks. No failure is captured before this number.
DFS. Heartbeat. Interval 3 Interval of the DN heartbeat detection. Seconds
DFS. namenode. handler. Count 10 Number of threads expanded after NN is started.
DFS. Balance. bandwidthpersec 1048576 The maximum bandwidth per second used for balance. bytes are used as the unit instead of bit.
DFS. Hosts /Opt/hadoop/CONF/hosts. Allow A host name list file, where the host is allowed to connect to Nn, must write an absolute path. If the file content is empty, all are considered acceptable.
DFS. Hosts. Exclude /Opt/hadoop/CONF/hosts. Deny The basic principle is the same as above, but the list of host names that prohibit access to NN is shown here. This is useful in removing the DN from the cluster.
DFS. Max. Objects 0 The maximum number of concurrent DFS objects. All files and directory blocks in HDFS are considered as an object. 0 indicates no restriction
DFS. Replication. Interval 3 Nn calculates the internal interval of the replication block, and usually does not need to be written into the configuration file. The default value is good.
DFS. Support. append True | false The new hadoop supports file append operations. This is to control whether file append is allowed, but the default value is false. The reason is that there are bugs in append operations.
DFS. datanode. Failed. volumes. tolerated 0 Maximum number of bad hard disks that can cause DN to crash. The default value 0 indicates that if one hard disk breaks down, the DN will shut down.
DFS. Secondary. http. Address 0.0.0.0: 50090 SNN tracker page listening address and port
DFS. datanode. Address 0.0.0.0: 50010 The service listening port of the DN. If the port is 0, it will randomly listen to the port and send a heartbeat notification to NN.
DFS. datanode. http. Address 0.0.0.0: 50075 The tracker page listening address and port of DN
DFS. datanode. IPC. Address 0.0.0.0: 50020 The IPC listening port of the DN. If the value is 0, the listening port is transmitted to the NN through the heartbeat at the random port.
DFS. datanode. handler. Count 3 Number of service threads started by DN
DFS. http. Address 0.0.0.0: 50070 Nn tracker page listening address and port
DFS. HTTPS. Enable True | false Whether the NN tracker is listening for HTTPS. The default value is false.
DFS. datanode. HTTPS. Address 0.0.0.0: 50475 HTTPS tracker page listening address and port of DN
DFS. HTTPS. Address 0.0.0.0: 50470 The listener address and port of the HTTPS tracker page of NN
DFS. datanode. Max. xcievers 2048 This parameter is equivalent to the maximum number of files opened in Linux. this parameter is not included in the document. When a dataxceiver error is reported, you need to increase the value. The default value is 256.

 

 

Name Value Description
Hadoop. Job. History. Location   The path of the historical job file, which has no configuration parameters and does not need to be written in the configuration file. It is in the History folder of logs by default.
Hadoop. Job. History. User. Location   Location of historical User Files
Io. Sort. Factor 30 The number of files to be sorted when stream merging is processed. I think it is the number of files to be opened during sorting.
Io. Sort. MB 600 Memory Used for sorting, in MB. The default value is 1. I remember that it cannot exceed the mapred. Child. java. Opt setting; otherwise, oom
Mapred. Job. Tracker Hadoopmaster: 9001 The configuration item connecting to the jobtrack server. The default value is "local", "map number 1", and "reduce number 1 ".
Mapred. Job. tracker. http. Address 0.0.0.0: 50030 Service Listening address of the tracker page of jobtracker
Mapred. Job. tracker. handler. Count 15 Jobtracker service thread count
Mapred. task. tracker. Report. Address 127.0.0.1: 0 Tasktracker listens to the server, which does not need to be configured and is not recommended by the official team.
Mapred. Local. dir /Data1/HDFS/mapred/local,
/Data2/HDFS/mapred/local,
...
The folder used by mapred for local calculation. Multiple hard disks can be configured and separated by commas (,).
Mapred. system. dir /Data1/HDFS/mapred/system,
/Data2/HDFS/mapred/system,
...
The folder used by mapred to store control files. Multiple hard disks can be configured and separated by commas.
Mapred. Temp. dir /Data1/HDFS/mapred/temp,
/Data2/HDFS/mapred/temp,
...
The path of the Temporary Folder shared by mapred is described as above.
Mapred. Local. dir. minspacestart 1073741824 If the remaining space in the local operation folder is lower than this value, it is not calculated locally. Byte configuration. The default value is 0.
Mapred. Local. dir. minspacekill 1073741824 If the remaining space in the local computing folder is lower than this value, no new task is applied. The number of bytes. The default value is 0.
Mapred. tasktracker. expiry. Interval 60000 If TT does not send a heartbeat packet within this time period, it is deemed that TT has crashed. Unit: milliseconds
Mapred. Map. Tasks 2 By default, the number of maps used by each job indicates that if the DFS block size is set to 64 MB and a 60 MB file needs to be sorted, two map threads will be enabled, it does not work when jobtracker is set to local.
Mapred. Reduce. Tasks 1 Same as above
Mapred. jobtracker. Restart. Recover True | false Enable task recovery upon restart. The default value is false.
Mapred. jobtracker. taskscheduler Org. Apache. hadoop. mapred.
Capacitytaskscheduler

Org. Apache. hadoop. mapred.
Jobqueuetaskscheduler

Org. Apache. hadoop. mapred.
Fairscheduler
The important thing is to enable the task manager. If you do not set it, hadoop uses the FIFO scheduler by default. Other schedulers can use the fair and computing power scheduler.
Mapred. Reduce. Parallel. Copies 10 Number of parallel copies used by reduce in the shuffle stage. The default value is 5.
Mapred. Child. java. opts

-Xmx2048m

-Djava. Library. Path =
/Opt/hadoopgpl/native/
Linux-amd64-64

Memory size of the virtual machine used by each TT sub-process
Tasktracker. http. threads 50 Number of HTTP server threads used by TT to track task tasks
Mapred. task. tracker. http. Address 0.0.0.0: 50060 The http ip address and port that TT listens to by default. It can be left empty by default. If the port is set to 0, it is randomly used.
Mapred. Output. Compress True | false The task results are compressed and output. The default value is false. False is recommended.
Mapred. Output. Compression. Codec Org. Apache. hadoop. Io.
Compress. defaultcodec
The decoder used for output results. You can also use GZ, Bzip2, lzo, snappy, etc.
Mapred. Compress. Map. Output True | false Whether the map output results are output in the compressed format before network switching. The default value is false. We recommend that you set this parameter to true to reduce the bandwidth usage and reduce the cost.
Mapred. Map. Output. Compression. Codec Com. hadoop. compression.
Lzo. lzocodec
The codecs used for compressing the output in the map stage
Map. Sort. Class Org. Apache. hadoop. util.
Quicksort
The algorithm used for map output sorting. The default value is "Quick Sort.
Mapred. Hosts Conf/mhost. Allow List of TT servers allowed to connect to JT. All null values are allowed.
Mapred. Hosts. Exclude Conf/mhost. Deny It is helpful to disable connection to the TT list of JT.
Mapred. queue. Names ETL, Rush, default List of queue names used with the scheduler, separated by commas
Mapred. tasktracker. Map.
Tasks. Maximum
12 Maximum number of map slots per server.
Mapred. tasktracker. reduce.
Tasks. Maximum
6 Maximum number of reduce slots per server

Common hadoop configurations

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.