Install Hadoop2.7.1 on CentOS 7
1. Download jdk 1.8 and Hadoop2.7.1, decompress the package to the/home/directory.
2. Configure the jdk 1.8 and Hadoop2.7.1 Environment Variables
3. Configure the IP address of each host, modify the host name of each host, and modify the one-to-one ing between the IP address and Host Name of the hosts file.
Master 10.0.0.44
Host1 10.0.0.43
Host2 10.0.0.42
4. Disable the firewall for each host and set it to not start automatically when it is started.
5. Set SSH password-free login because Hadoop needs to log on to each node for operations
(1) Remove the comments of the two lines in/etc/ssh/sshd_config from CentOS (each host needs)
# RSAAuthentication yes# PubkeyAuthentication yes
(2) enter the commandssh-keygen -t rsaGenerate the key without entering the password. Press enter all the time and/root will generate the. ssh folder (required for each host) (3) Merge the public keyauthorized_keysFile, on the master server, enter/root/.sshDirectory, which can be merged using SSH commands
cat id_rsa.pub>> authorized_keysssh root@10.0.0.43 cat ~/.ssh/id_rsa.pub>> authorized_keysssh root@10.0.0.42 cat ~/.ssh/id_rsa.pub>> authorized_keys
(4) After the authorized_keys and known_hosts of the master server are copied to the/root/. ssh directory (5) of the other two servers, you do not need to enter a password for SSH login.
6. Configure Hadoop (only configured on the host, others can be directly copied) (1) configure the hadoop-2.7.1 under the core-site.xml/etc/hadoop directory
<Property> <name> fs. defaultFS </name> <value> hdfs: // temporary directory of master: 9000 </value> </property> <property> // Hadoop. Other directories are based on this path. Local directory. <Name> hadoop. tmp. dir </name> <value> file:/home/hadoop-2.7.1/tmp </value> </property> <property> // The cache size used to read and write files. The size should be a multiple of the Memory Page. <Name> io. file. buffer. size </name> <value> 131702 </value> </property>
(2) Configure hadoop-2.7.1 under the hdfs-site.xml/etc/hadoop directory
<Property> // local disk directory where NN stores the fsimage file <name> dfs. namenode. name. dir </name> <value> file:/home/hadoop-2.7.1/dfs/name </value> </property> <property> // local disk directory, HDFS data should store blocks <name> dfs. datanode. data. dir </name> <value> file:/home/hadoop-2.7.1/dfs/data </value> </property> <property> // number of data block replicas <name> dfs. replication </name> <value> 3 </value> </property> <property> // http service address of SNN. If it is 0, the service selects a random idle port. <Name> dfs. namenode. secondary. http-address </name> <value> master: 9001 </value> </property> <property> // enable the WebHDFS (rest api) function on NN and DN. <Name> dfs. webhdfs. enabled </name> <value> true </value> </property>
(3) Configure hadoop-2.7.1 under the mapred-site.xml/etc/hadoop directory
<property> <name>mapreduce.framework.name</name> <value>yarn</value></property><property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value></property><property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value></property>
(4) Configure hadoop-2.7.1 under the yarn-site.xml/etc/hadoop directory
<Property> // set this parameter to mapreduce_shuffle. <name> Yarn. nodemanager. aux-services </name> <value> mapreduce_shuffle </value> </property> <propert> <name> yarn. nodemanager. auxservices. mapreduce. shuffle. class </name> <value> org. apache. hadoop. mapred. shuffleHandler </value> </property> <property> // RM address: Port <name> yarn. resourcemanager. address </name> <value> master: 8032 </value> </property> <property> // scheduler address: Port <name> yarn. resourcemanager. scheduler. addresses </name> <value> master: 8030 </value> </property> <name> yarn. resourcemanager. resource-tracker.address </name> <value> master: 8031 </value> </property> <property> // RM management interface address: Port <name> yarn. resourcemanager. admin. address </name> <value> master: 8033 </value> </property> <property> // RM webpage interface address: Port <name> yarn. resourcemanager. webapp. address </name> <value> master: 8088 </value> </property> <property> // The physical memory size that can be applied by the container on the NM, MB <name> yarn. nodemanager. resource. memory-mb </name> <value> 5120 </value> </property>
(5) Configure hadoop-2.7.1, hadoop-env.sh, yarn-env.sh under the mapred-env.sh/etc/hadoop directory configure JAVA_HOME path (6) Configure slaves files under the hadoop-2.7.1/etc/hadoop directory, add slave server master host1 host2
(7) Copy scp-r to all the folders where data is stored from the server (8) installation directory, such as tmp, hdfs, hdfs/data, and hdfs/name (9) hadoop parameters https://segmentfault.com/a/1190000000709725
7. Execute bin/hdfs namenode-format on the master server for initialization (formatting)
8. Run the./start-all.sh startup jps view information under the sbin directory./stop-all.sh stop
9. Open master: 8088 and master: 50070 in the browser to view information.