I have been studying hadoop by myself recently. Today I am spending some time building a development environment and working out my documents.
First, you need to understand the hadoop running mode:
Standalone)
The standalone mode is the default mode of hadoop. When the source code package of hadoop is decompressed for the first time, hadoop does not know the hardware installation environment and conservatively selects the minimum configuration. In this default mode, all three XML files are empty. When the configuration file is empty, hadoop runs completely locally. Because it does not need to interact with other nodes, HDFS is not used in standalone mode, and any hadoop daemon process is not loaded. This mode is mainly used to develop and debug the application logic of mapreduce programs.
Pseudo Distribution Mode)
In pseudo-distribution mode, hadoop is run on a single-node cluster, and all daemon processes run on the same machine. This mode adds the code debugging function in standalone mode, allowing you to check the memory usage, HDFS input and output, and other daemon processes.
Full Distribution Mode)
The hadoop daemon runs on a cluster.
Version: Ubuntu 10.04.4, hadoop 1.0.2
1. Add a hadoop user to the System user
Before installation, add a user named hadoop to the system for hadoop testing.
~$ sudo addgroup hadoop~$ sudo adduser --ingroup hadoop hadoop
Now we only added a user hadoop, which does not have administrator permissions. Therefore, we need to add the user hadoop to the Administrator group:
~$ sudo usermod -aG admin hadoop
2. Install SSH
Because hadoop uses SSH for communication, first install SSH
~$ sudo apt-get install openssh-server
After the SSH installation is complete, start the service first:
~$ sudo /etc/init.d/ssh start
After the service is started, run the following command to check whether the service is correctly started:
~$ ps -e | grep ssh
As a secure communication protocol, a password is required for use. Therefore, we need to set a password-free logon to generate a private key and a public key:
hadoop@scgm-ProBook:~$ ssh-keygen -t rsa -P ""
Because I already have a private key, I am prompted to overwrite the current private key. During the first operation, you will be prompted to enter the password and press Enter ~ /Home/{username }/. two files are generated under SSH: id_rsa and id_rsa.pub. The former is the private key and the latter is the public key, now We append the public key to authorized_keys (authorized_keys is used to save all the public key content that allows users to log on to the SSH client as the current user ):
~$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Now, you can log on to SSH to confirm that you do not need to enter the password when logging on:
~$ ssh localhost
Logout:
~$ exit
Second login:
~$ ssh localhost
Logout:
~$ exit
In this way, you do not need to enter the password for logon.
3. install Java
~$ sudo apt-get install openjdk-6-jdk ~$ java -version
4. Install hadoop 1.0.2
Download the hadoop source file from the official website. Select hadoop 1.0.2.
Decompress the package and put it in the desired directory. I put it in/usr/local/hadoop
~$ sudo tar xzf hadoop-1.0.2.tar.gz~$ sudo mv hadoop-1.0.2 /usr/local/hadoop
Make sure that all operations are completed under hadoop:
~$ sudo chown -R hadoop:hadoop /usr/local/hadoop
5. Set the hadoop-env.sh (Java installation path)
Go to the hadoop directory, open the conf directory, go to the hadoop-env.sh, and add the following information:
Export java_home =/usr/lib/JVM/java-6-openjdk (depending on your machine's Java installation path)
Export hadoop_home =/usr/local/hadoop
Export Path = $ path:/usr/local/hadoop/bin
Make the environment variable configuration take effect. Source
~$ source /usr/local/hadoop/conf/hadoop-env.sh
Now, the standalone mode of hadoop has been installed successfully.
Run the wordcount example provided by hadoop to experience the following mapreduce process:
Create an input folder in the hadoop directory
~$ mkdir input
Copy all files in conf to the Input Folder.
~$ cp conf/* input
Run the wordcount program and save the result to output.
~$ bin/hadoop jar hadoop-0.20.2-examples.jar wordcount input output
Run
~$ cat output/*
You will see that the word and frequency of all conf files are counted.
Below are some configurations required for the pseudo distribution mode. Continue.
6. Set *-site. xml
Here you need to set up 3 files: core-site.xml, hdfs-site.xml, mapred-site.xml, all under the/usr/local/hadoop/conf directory
Core-site.xml: configuration items for hadoop core, such as I/O settings commonly used by HDFS and mapreduce.
Hdfs-site.xml: configuration items for the hadoop daemon, including namenode, secondary namenode and datanode.
Mapred-site.xml: configuration items for the mapreduce daemon, including jobtracker and tasktracker.
First, create Several folders in the hadoop directory.
~/hadoop$ mkdir tmp~/hadoop$ mkdir hdfs~/hadoop$ mkdir hdfs/name~/hadoop$ mkdir hdfs/data
Edit the three files:
Core-site.xml:
<configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/tmp</value> </property></configuration>
Hdfs-site.xml:
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.name.dir</name> <value>/usr/local/hadoop/hdfs/name</value> </property> <property> <name>dfs.data.dir</name> <value>/usr/local/hadoop/hdfs/data</value> </property></configuration>
Mapred-site.xml:
<configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property></configuration>
7. Format HDFS
Through the above steps, we have set the hadoop standalone test to the environment, and then start hadoop to related services, format namenode, secondarynamenode, tasktracker:
~$ source /usr/local/hadoop/conf/hadoop-env.sh~$ hadoop namenode -format
8. Start hadoop
Then run the start-all.sh to start all services, including namenode, datanode, and start-all.sh scripts to load the daemon.
hadoop@ubuntu:/usr/local/hadoop$ cd binhadoop@ubuntu:/usr/local/hadoop/bin$ start-all.sh
Use the Java JPs command to list all daemon to verify successful installation.
hadoop@ubuntu:/usr/local/hadoop$ jps
The following list is displayed, indicating that the operation is successful.
9. Check the running status
All the settings have been completed, and hadoop has been started. Now you can use the following operations to check whether the service is normal. In hadoop, the Web interface is used to monitor the health status of the cluster:
Http: // localhost: 50030/-hadoop Management Interface
Http: // localhost: 50060/-hadoop task tracker status
Http: // localhost: 50070/-hadoop DFS status
Hadoop management interface:
Hadoop task tracker status:
Hadoop DFS status:
So far, hadoop's pseudo-distribution mode has been successfully installed, so run the wordcount example of hadoop in pseudo-distribution mode again to feel the following mapreduce process:
Note that the program runs in DFS of the file system, and the files created are also based on the file system:
First, create the input directory in DFS.
hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop dfs -mkdir input
Copy the file in conf to the input file in DFS.
hadoop@ubuntu:/usr/local/hadoop$ hadoop dfs -copyFromLocal conf/* input
Run wordcount in pseudo-distributed mode
hadoop@ubuntu:/usr/local/hadoop$ hadoop jar hadoop-examples-1.0.2.jar wordcount input output
You can see the following process
Display output results
hadoop@ubuntu:/usr/local/hadoop$ hadoop dfs -cat output/*
When hadoop ends, you can close the hadoop daemon through the stop-all.sh script
hadoop@ubuntu:/usr/local/hadoop$ bin/stop-all.sh
10. Conclusion
Hadoop is successfully built on Ubuntu! I'm a little excited. I can't wait to start some related development and have a deep understanding of hadoop kernel implementation. Continue!
PS: Both standalone and pseudo distribution modes are used for development and debugging. The real hadoop cluster runs in the third mode, that is, the full distribution mode. To be continued.
This article references the articles of the two students. Thank you for sharing them!
Http://blog.sina.com.cn/s/blog_61ef49250100uvab.html
Http://www.cnblogs.com/welbeckxu/category/346329.html