Build an experimental environment for the Hadoop series and build the hadoop Series
Basic configuration of the experiment environment
Hardware: 50 GB, 1 GB memory, single core on a single hard disk node.
Operating System: CentOS6.4 64bit
Hadoop: 2.20 64bit (Compiled)
JDK: jdk1.7
Disk Partition:
/ |
5 GB |
/Boot |
100 MB |
/Usr |
5 GB |
/Tmp |
500 MB |
Swap |
2 GB |
/Var |
1 GB |
/Home |
Remaining Space |
Linux installation Configuration
No desktop (Minimal)
Base System à Base, Compatibility libraries, Performance Tools, Perl Support
Development à Development Tools
Supported ages à Chinese Support
Create a Hadoop user
Useradd Hadoop
Passwd Hadoop
Network Configuration modify ip
Vim/etc/sysconfig/network-scripts/ifcfg-eth0
Save and restart the network service network restart
Modify host name
Vim/etc/sysconfig/network
Host Name and IP binding
Vim/etc/host
Disable Firewall
View service iptables status
Disable Firewall service iptables stop
View the firewall startup status chkconfig iptables-list
Disable firewall boot start chkconfig iptables off
Disable SELinux
Vim/etc/sysconfig/selinux
Setenforce 0
Getenforce
SSH Login-free settings
Use hadoop users to generate public and private keys: ssh-keygen-t rsa
Send authorization to Slave1.. 5: ssh-copy-I Slave1
Similarly, Slave1... 5. Password-free logon to the Master
To ensure the communication between S1 and Master, do not log on From S1 to Master.
Install JDK
Decompress jdk1.7 to the/usr/local/directory and change the name to jdk.
Modify the/etc/profile file
Host Name |
IP |
Installed Software |
Running Process |
Master |
172.20.53.151 |
Jdk, hadoop |
NameNode, DFSZKFailoverController |
Slave1 |
172.20.53.171 |
Jdk, hadoop |
ResourceManager |
Slave2 |
172.20.53.21 |
Jdk, hadoop, |
NameNode, DFSZKFailoverController |
Slave3 |
172.20.53.37 |
Jdk, hadoop, and zookeeper |
DataNode, NodeManager, JournalNode, QuorumPeerMain |
Slave4 |
172.20.53.174 |
Jdk, hadoop, and zookeeper |
DataNode, NodeManager, JournalNode, QuorumPeerMain |
Slave5 |
172.20.53.177 |
Jdk, hadoop, and zookeeper |
DataNode, NodeManager, JournalNode, QuorumPeerMain |
Zookeeper Installation
Install zookeeper on S3, S4, and S5 nodes:
- Log on to S3 as a root user and decompress zookeeper to/usr/local:
Tar-zxvf zookeeper-3.4.5.tar.gz-C/usr/local/
- Go to the zookeeper directory and configure it.
- Rename zoo_sample.cfg In the conf directory to zoo. cfg:
Mv zoo_sample.cfg zoo. cfg, which is used for reading when zookeeper is started
- Create File myid in/usr/local/zookeeper-3.4.5/data, write server id: 1
- Modify the log storage path in zoo. cfg to/usr/local/zookeeper-3.4.5/data (remember to create the data DIRECTORY and create the myid file) as follows:
- Add the following information at the end of the file:
Server ID: server.1
Zookeeper running HOST: Slave3. 5
Port: 2888
Election port: 3888
- Send the configured zookeeper to S4, S5 with scp
Scp-r/usr/local/zookeeper-3.4.5/root @ Slave4:/usr/local/zookeeper-3.4.5/
Scp-r/usr/local/zookeeper-3.4.5/root @ Slave5:/usr/local/zookeeper-3.4.5/
Don't forget to modify the server number in the myid File
- Start the zk of the three nodes:
Call the zkServer. sh script command in the bin directory:./zkServer. sh start
- View the status./zkServer. sh status
Only one of the three nodes is the leader, and the other is the follower.
Install hadoop
Upload the compiled hadoop-2.2.0.tar.gz file to the Master, decompress it to the/usr directory as the root user, and rename it hadoop.
- Create a tmp folder in the hadoop directory (omitted)
Mkdir tmp
- Set the owner of the hadoop directory to hadoop:
Chown-R Hadoop: hadoop Hadoop
- Add hadoop to the environment variable vim/etc/profile
The other nodes are also configured.
Configure hadoop
- Configure HDFS (all configuration files of hadoop2.0 are in the $ HADOOP_HOME/etc/hadoop directory)
Export JAVA_HOME =/usr/local/jdk
Export HADOOP_HOME =/usr/hadoop
Export PATH = $ PATH: $ JAVA_HOME/bin: $ HADOOP_HOME/bin
Modify the configuration file in the/usr/Hadoop/etc/Hadoop/directory
- Configure the hadoop runtime environment, modify hadoo-env.sh:
Although the Hadoop. tmp. dir parameter is called a temporary directory, the hdfs data is saved later.
- Modify hdfs-site.xml files
<Configuration>
<! -- Specify the nameservice of hdfs as ns1, which must be consistent with that in the core-site.xml -->
<Property>
<Name> dfs. nameservices </name>
<Value> ns1 </value>
</Property>
<! -- There are two NameNode under ns1, namely nn1 and nn2 -->
<Property>
<Name> dfs. ha. namenodes. ns1 </name>
<Value> nn1, nn2 </value>
</Property>
<! -- RPC communication address of nn1 -->
<Property>
<Name> dfs. namenode. rpc-address.ns1.nn1 </name>
<Value> Master: 9000. </value>
</Property>
<! -- Nn1 http Communication address -->
<Property>
<Name> dfs. namenode. http-address.ns1.nn1 </name>
<Value> Master: 50070. </value>
</Property>
<! -- RPC communication address of nn2 -->
<Property>
<Name> dfs. namenode. rpc-address.ns1.nn2 </name>
<Value> Slave1: 9000 </value>
</Property>
<! -- Nn2 http Communication address -->
<Property>
<Name> dfs. namenode. http-address.ns1.nn2 </name>
<Value> Slave1: 50070 </value>
</Property>
<! -- Specify the storage location of NameNode metadata on JournalNode -->
<Property>
<Name> dfs. namenode. shared. edits. dir </name>
<Value> qjournal: // Slave3: 8485; Slave4: 8485; Slave5: 8485/ns1 </value>
</Property>
<! -- Specify the location where JournalNode stores data on the local disk -->
<Property>
<Name> dfs. journalnode. edits. dir </name>
<Value>/usr/hadoop/journal </value>
</Property>
<! -- Enable automatic failover when NameNode fails -->
<Property>
<Name> dfs. ha. automatic-failover.enabled </name>
<Value> true </value>
</Property>
<! -- Implementation of Automatic Switch upon configuration failure -->
<Property>
<Name> dfs. client. failover. proxy. provider. ns1 </name>
<Value> org. apache. hadoop. hdfs. server. namenode. ha. ConfiguredFailoverProxyProvider </value>
</Property>
<! -- Configure the isolation mechanism -->
<Property>
<Name> dfs. ha. fencing. methods </name>
<Value> sshfence </value>
</Property>
<! -- Ssh Login-free is required to use the isolation mechanism -->
<Property>
<Name> dfs. ha. fencing. ssh. private-key-files </name>
<Value>/home/hadoop/. ssh/id_rsa </value>
</Property>
</Configuration>
- Rename mapred-site.xml.template to mapred-site.xml and configure the following
Description: The MR framework runs on yarn.
- Configure the subnode file: slaves
DN: S3, S4, S5
- Copy the configured hadoop to another node (root)
Scp-r/usr/Hadoop/Slave1:/usr/Hadoop/
After copying, modify the permission: chown-R Hadoop: hadoop Hadoop
Start hadoop
Bin/zkServer. sh start
- Start journalnode (start all journalnodes on the Master)
Cd/usr/hadoop
Sbin/hadoop-daemons.sh start journalnode simultaneously starts multiple processes through the ssh protocol
(Run the jps command to check if the JournalNode process is added)
Run the following command on the Master: hadoop namenode-format
Copy the tmp directory in the Master to/usr/Hadoop/of Slave1:
Scp-r/usr/Hadoop/tmp Slave1:/usr/Hadoop/
- Format ZK (executed on Master): hdfs zkfc-formatZK
In this case, run the./zkCli. sh command in the bin directory of zk on the s3.. S5 node. You can find that the hadoop-ha directory is used to save data.
- Start HDFS (run on Master ):
Sbin/start-dfs.sh
If a NameNode fails, restart to use the command sbin/Hadoop-deamon.sh start namenode to ensure that both NameNode require ssh password-free login.
- Start yarn: sbin/start-yarn.sh on Slave2
Note:
About modifying Virtual Machine NICs
- Modify the/etc/udev/rules. d/70-persistent-net.rules File
Delete eth0 information. Modify the name of the second eth1 Nic to eth0.
- Modify the mac address of eth0 in/etc/sysconfig/network-scripts/ifcfg-eth0 to/etc/udev/rules. d/70-persistent-net.rules.
How to Use VMWare + Hadoop to build a cloud computing environment and run a simple cloud computing instance on a rough Platform
The example of word statistics in hadoop can be tested. You have to build an environment step by step. The question above is too general. It should be like this: 1. How to install vmwarevm in XP or Windows 7. Install the linux system after installation. 2. install linux on a virtual machine (ubuntu, redhat, and fedora ). 3. If the redhat system is installed, find and install hadoop under redhat (a version of hadoop is downloaded from the official website ). 4. Test the hadoop environment. The installation step will show you how to test the word statistics example. Step by step! The configuration of these environments depends on the environment you want to build on several nodes. The configuration of each node is the same.
Hadoop Development Environment Configuration
My eclipse is installed in WINDOWS and HADOOP is in the CENTOS production environment.
But the principle is the same.
Club.sm160.com/showtopic-937269.aspx
HADOOP version is hadoop1.0.0 (same as your HADOOP 1.0.4 configuration)