Note: Because the Hadoop remote invocation is RPC, the Linux system must shut down the firewall
Service Iptables Stop
1.vi/etc/inittab
Id:5:initdefault: Change to Id:3:initdefault: start with character type
2.ip configuration:/etc/sysconfig/network-scripts/
3.vi/etc/hosts,add hostname
4.useradd Hadoop: Adding a user
passwd Hadoop: Adding a password to a user
5. For the following documents:
-rw-r–r–1 root root 42266180 Dec 10:08 hadoop-0.19.0.tar.gz
You can modify the following command:
chmod 777 Hadoop hadoop-0.19.0.tar.gz: Modify file permissions to maximum permissions
Chown hadoop.hadoop hadoop-0.19.0.tar.gz: Change the owner and group owner of the file to Hadoop
6. Increase SSH authorization on each master and slavers (operate under Hadoop users)
Three return all the way with the ssh-keygen-t RSA command
CD. SSH
CP Id_rsa.pub Authorized_keys
Copy authorized_keys files from master to all other slaves machines via SCP such as:
SCP Authorized_keys Root@slave01:/home/hadoop/master_au_keys
At the same time, the Authorized_keys on the Daily slave machine is also added to the master machine Authorized_keys
Use SSH master or SSH slave01 without passwords, i.e. ok!
7. Install JDK
Download the JDK installation package to the Sun Web site jdk-6u11-linux-i586.bin,copy to the machine's USR directory and install it under the root user of each machine.
Under Root:
Cd/usr
chmod +x Jdk-6u11-linux-i586.bin adds execution permissions to the installation files.
./jdk-6u11-linux-i586.bin, when prompted to press several empty bar, enter Yes and start installing JDK6.
When installed, modify the directory name to Jdk6.
Note (Centos5.2 can not delete the JDK 1.4): The general Linux installed after a 1.4 jdk, must be deleted.
Rpm-qa |grep-i Java, you will see all the RMP packages that contain Java removed.
RPM-E package name.
Setting the environment variables for JDK, given that the JDK may be used by other system users, it is recommended that you set the environment variables directly in the/etc/profile:
Export JAVA_HOME=/USR/JDK6
Export classpath= $CLASSPATH: $JAVA _home/lib: $JAVA _home/jre/lib
Export path= $JAVA _home/bin: $JAVA _home/jre/bin: $PATH: $HOME Bin
Use the Java environment with Source/etc/profile to take effect.
8.Hadoop environment variable settings and configuration file modifications
Add JDK directory to conf/hadoop-env file
Export JAVA_HOME=/USR/JDK6
Add Namenode machine name in Masters: Master
Add Datanode machine name in slavers: Slave01 ...
Add the path path to Hadoop in the/etc/profile file:
Export hadoop_home=/home/hadoop/hadoop-0.19.0
Export path= $PATH: $HADOOP _home/bin
Modify Hadoop-site.xml
Add the following:
fs.default.name//your namenode configuration, machine name plus port
hdfs://10.2.224.46:54310/
mapred.job.tracker//your jobtracker configuration, machine name plus port
hdfs://10.2.224.46:54311/
The number of dfs.replication//data needs to be backed up, by default is three
1
Hadoop.tmp.dir//hadoop default temporary path, this is the best configuration, and then in the new node or other circumstances inexplicably datanode can not start, delete the TMP directory in this file. However, if this directory is removed from the Namenode machine, then the Namenode formatted command needs to be executed again.
/home/hadoop/tmp/
Dfs.name.dir
/home/hadoop/name/
Dfs.data.dir
/home/hadoop/data/
Some parameters of the Mapred.child.java.opts//java virtual machine can refer to the configuration
-xmx512m
The size of the Dfs.block.size//block, Unit bytes, will be referred to the use, must be a multiple of 512, because CRC is used for file integrity inspection, the default configuration 512 is the smallest unit of checksum.
5120000
The default block size is for new files.
———————–
Before we start, we need to format the Namenode first, enter the ~/hadoopinstall/hadoop directory, and execute the following command
$bin/hadoop Namenode-format
Now it's time to officially start Hadoop, and there are a lot of startup scripts under bin/that can be started according to your needs.
* start-all.sh start all Hadoop daemons. including Namenode, Datanode, Jobtracker, Tasktrack
* stop-all.sh Stop all Hadoop
* start-mapred.sh start map/reduce Guardian. including Jobtracker and Tasktrack.
* Stop-mapred.sh Stop Map/reduce Guard
* start-dfs.sh start Hadoop dfs daemon Namenode and Datanode
* Stop-dfs.sh Stop Dfs Guardian
————————–
Viewing and testing
Bin/hadoop dfsadmin-report View all Datanode nodes
Browse Namenode and Jobtracker in Web Form
* namenode-http://10.0.0.88:50070
* jobtracker-http://10.0.0.88:50030