Install Hadoop2.7.1 on CentOS 7

Source: Internet
Author: User

Install Hadoop2.7.1 on CentOS 7

1. Download jdk 1.8 and Hadoop2.7.1, decompress the package to the/home/directory.

2. Configure the jdk 1.8 and Hadoop2.7.1 Environment Variables

3. Configure the IP address of each host, modify the host name of each host, and modify the one-to-one ing between the IP address and Host Name of the hosts file.
Master 10.0.0.44
Host1 10.0.0.43
Host2 10.0.0.42

4. Disable the firewall for each host and set it to not start automatically when it is started.

5. Set SSH password-free login because Hadoop needs to log on to each node for operations

(1) Remove the comments of the two lines in/etc/ssh/sshd_config from CentOS (each host needs)

# RSAAuthentication yes# PubkeyAuthentication yes

(2) enter the commandssh-keygen -t rsaGenerate the key without entering the password. Press enter all the time and/root will generate the. ssh folder (required for each host) (3) Merge the public keyauthorized_keysFile, on the master server, enter/root/.sshDirectory, which can be merged using SSH commands

cat id_rsa.pub>> authorized_keysssh root@10.0.0.43 cat ~/.ssh/id_rsa.pub>> authorized_keysssh root@10.0.0.42 cat ~/.ssh/id_rsa.pub>> authorized_keys

(4) After the authorized_keys and known_hosts of the master server are copied to the/root/. ssh directory (5) of the other two servers, you do not need to enter a password for SSH login.

6. Configure Hadoop (only configured on the host, others can be directly copied) (1) configure the hadoop-2.7.1 under the core-site.xml/etc/hadoop directory

<Property> <name> fs. defaultFS </name> <value> hdfs: // temporary directory of master: 9000 </value> </property> <property> // Hadoop. Other directories are based on this path. Local directory. <Name> hadoop. tmp. dir </name> <value> file:/home/hadoop-2.7.1/tmp </value> </property> <property> // The cache size used to read and write files. The size should be a multiple of the Memory Page. <Name> io. file. buffer. size </name> <value> 131702 </value> </property>

(2) Configure hadoop-2.7.1 under the hdfs-site.xml/etc/hadoop directory

<Property> // local disk directory where NN stores the fsimage file <name> dfs. namenode. name. dir </name> <value> file:/home/hadoop-2.7.1/dfs/name </value> </property> <property> // local disk directory, HDFS data should store blocks <name> dfs. datanode. data. dir </name> <value> file:/home/hadoop-2.7.1/dfs/data </value> </property> <property> // number of data block replicas <name> dfs. replication </name> <value> 3 </value> </property> <property> // http service address of SNN. If it is 0, the service selects a random idle port. <Name> dfs. namenode. secondary. http-address </name> <value> master: 9001 </value> </property> <property> // enable the WebHDFS (rest api) function on NN and DN. <Name> dfs. webhdfs. enabled </name> <value> true </value> </property>

(3) Configure hadoop-2.7.1 under the mapred-site.xml/etc/hadoop directory

<property>    <name>mapreduce.framework.name</name>    <value>yarn</value></property><property>    <name>mapreduce.jobhistory.address</name>    <value>master:10020</value></property><property>    <name>mapreduce.jobhistory.webapp.address</name>    <value>master:19888</value></property>

(4) Configure hadoop-2.7.1 under the yarn-site.xml/etc/hadoop directory

<Property> // set this parameter to mapreduce_shuffle. <name> Yarn. nodemanager. aux-services </name> <value> mapreduce_shuffle </value> </property> <propert> <name> yarn. nodemanager. auxservices. mapreduce. shuffle. class </name> <value> org. apache. hadoop. mapred. shuffleHandler </value> </property> <property> // RM address: Port <name> yarn. resourcemanager. address </name> <value> master: 8032 </value> </property> <property> // scheduler address: Port <name> yarn. resourcemanager. scheduler. addresses </name> <value> master: 8030 </value> </property> <name> yarn. resourcemanager. resource-tracker.address </name> <value> master: 8031 </value> </property> <property> // RM management interface address: Port <name> yarn. resourcemanager. admin. address </name> <value> master: 8033 </value> </property> <property> // RM webpage interface address: Port <name> yarn. resourcemanager. webapp. address </name> <value> master: 8088 </value> </property> <property> // The physical memory size that can be applied by the container on the NM, MB <name> yarn. nodemanager. resource. memory-mb </name> <value> 5120 </value> </property>

(5) Configure hadoop-2.7.1, hadoop-env.sh, yarn-env.sh under the mapred-env.sh/etc/hadoop directory configure JAVA_HOME path (6) Configure slaves files under the hadoop-2.7.1/etc/hadoop directory, add slave server master host1 host2

(7) Copy scp-r to all the folders where data is stored from the server (8) installation directory, such as tmp, hdfs, hdfs/data, and hdfs/name (9) hadoop parameters https://segmentfault.com/a/1190000000709725

7. Execute bin/hdfs namenode-format on the master server for initialization (formatting)

8. Run the./start-all.sh startup jps view information under the sbin directory./stop-all.sh stop

9. Open master: 8088 and master: 50070 in the browser to view information.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.