Hadoop learning notes (1) Environment setup
My environment is:
Install hadoop1.0.0 in ubuntu11.10 (standalone pseudo-distributed)
Install SSH
Apt-Get Install SSH
Install rsync
Apt-Get install rsync
Configure SSH password-free Login
Ssh-keygen-t dsa-p'-f ~ /. Ssh/id_dsa
Cat ~ /. Ssh/id_dsa.pub> ~ /. Ssh/authorized_keys
Verify whether it is successful
SSH localhost
Install hadoop1.0.0 and JDK
Create a Linux terminal and an app directory. Both Java and hadoop are installed in this directory.
Mkdir/home/APP
Next, install Java and hadoop and decompress hadoop.
CD/home/APP
Chmod + x jdk-6u31-linux-i586.bin
/Jdk-6u31-linux-i586.bin
Tar zxf hadoop-1.0.0-bin.tar.gz
Configure JDK Environment Variables
VI/etc/profile
Add the following statement to the end
Export java_home =/home/APP/jdk1.6.0 _ 31
Export Path = $ java_home/bin: $ path
Export classpath =.: $ java_home/lib/dt. jar: $ java_home/lib/tools. Jar
Configure hadoop
Go to the hadoop directory
CD/home/APP/hadoop-1.0.0
Modify the configuration file and specify the JDK installation path.
Vi conf/hadoop-env.sh
Export java_home =/home/APP/jdk1.6.0 _ 31
Modify the hadoop core configuration file core-site.xml, which configures the address and port number of HDFS
Vi conf/core-site.xml
<Configuration>
<Property>
<Name> fs. Default. Name </Name>
<Value> HDFS :/// localhost: 9000 </value>
</Property>
</Configuration>
Modify the HDFS configuration in hadoop. The default backup mode is 3. Because the single-host version is installed, you need to change it to 1.
Vi conf/hdfs-site.xml
<Configuration>
<Property>
<Name> DFS. Replication </Name>
<Value> 1 </value>
</Property>
</Configuration>
Modify the mapreduce configuration file in hadoop, which configures the address and port of jobtracker.
Vi conf/mapred-site.xml
<Configuration>
<Property>
<Name> mapred. Job. Tracker </Name>
<Value> localhost: 9001 </value>
</Property>
</Configuration>
Next, start hadoop. Before starting hadoop, format the hadoop File System HDFS, enter the hadoop folder, and enter the following command
Bin/hadoop namenode-format
Start hadoop and enter the command
Bin/start-all.sh
This command starts all services.
Finally, verify that hadoop is successfully installed.Open your browser and enter the URL:
Http: // localhost: 50030 (mapreduce web page)
Http: // lcoalhost: 50070 (HDFS web page)
If all data can be viewed, the installation is successful.
Hadoop divides hosts into two roles from three perspectives:
First, it is divided into master and slave.
Second, from the HDFS perspective, the host is divided into namenode and datanode (in Distributed File Systems, directory management is very important, and Directory management is equivalent to the master, namenode is the Directory Manager ).
Third, from the perspective of mapreduce, the host is divided into jobtracker and tasktracker (a job is often divided into multiple tasks, from which it is not difficult to understand the relationship between them ).