Hadoopnamenode vs RM
- Small clusters: Namenode and RM can be deployed on a single node
- Large clusters: Because Namenode and RM have large memory requirements, they should be deployed separately. If deployed separately, ensure that the contents of the slaves file are the same, so that the NM and DN can be deployed on one node
Port
A port number of 0 instructs the server to start in a free port, but this is generally discouraged because it is incompati ble with setting cluster-wide firewall policies.
HDFSECC Memory
ECC memory is strongly recommended, as several Hadoop users has reported seeing many checksum errors when using NON-ECC m Emory on Hadoop clusters.
Dfs.name.dir
Configured as multiple paths, fsimage and Editlog write multiple paths at the same time, allowing for later recovery
Raid
Not suitable for Datanode, if configured as RAID, no replicas are required:
- Because HDFs's redundancy is good.
- The speed is slower than JBOD (Just a Bunch of Disks), the speed of the raid is determined by the slowest disk, and the JBOD disks have no effect on each other.
- If a disk in Jbod is damaged, HDFs will still work, but if a raid disk is broken, the entire data will be corrupted.
Suitable for namenode: for protecting meta-data information
Hadoop cluster optimization