Http://blog.sina.com.cn/s/blog_537770820100bxmf.html
Http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
Http://hadoop.apache.org/common/docs/current/single_node_setup.html
Hadoop standalone configuration process:
Standalone basic configuration information: ubuntu10.10 jdk1.6 hadoop-0.21.0 eclipse
1. Install the SSH server during Ubuntu installation, so that you do not have to install the SSH software in the future, because hadoop communicates with each machine through SSH. During installation, remember not to install the JDK that comes with Ubuntu, that is, virual machine host (BSD openjdk). It's useless. We need Sun JDK.
2. If you have not installed the SSH server, run the following command to install it:
Sudo apt-Get install OpenSSH-server openssh-Client
Stop SSH:/etc/init. d/ssh stop
Start SSH:/etc/init. d/ssh start
Restart SSH:/etc/init. d/ssh restart
After installing SSH, you can also use securecrt to access Ubuntu, which is easier than directly logging on to Ubuntu.
Create the. Ssh folder on each machine (after you log on with the root account, create it in the/root/. Ssh/directory ):
$ Mkdir
. SSH
Create a key pair on ubuntu01:
$ Ssh-keygen
-T RSA
A key pair (id_rsa, id_rsa.pub) can be generated by pressing the Enter key ). The key pair is in the/root/. Ssh directory. To view the key pair, select show hidden files.
Copy the id_rsa.pub generated by each machine to the authorized_keys file (the content of id_rsa.pub is a long line, so no characters are missing or redundant line breaks are mixed into the file ):
$ Cd. SSH
$ CP id_rsa.pub authorized_keys
Copy authorized_keys to the ubuntu01-ubuntu03:
$ SCP authorized_keys ubuntu02:/root/. SSH
SCP is remote copy through SSH. Enter the password of the remote host, that is, the password of the hadoop account on ubuntu02. Of course, you can also use other methods to copy the authorized_keys file to another machine.
Run the following command on each machine:
$ Chmod 640 authorized_keys
Now the SSH configuration on each machine has been completed. You can test it. For example, ubuntu01 initiates an SSH connection to ubuntu02.
$ SSH ubuntu02
If SSH is configured, the following message is displayed:
The
Authenticity of host [ubuntu02] Can't be established.
Key
Fingerprint is 1024 5f : A0: 0b: 65: D3: 82: DF: AB: 44: 62: 6d: 98: 9C : Fe: E9: 52.
Are
You sure you want to continue connecting (Yes/No )?
This is the first time you log on to this host. Type "yes ". This prompt is no longer displayed when you access this host for the second time.
3. Now you can install JDK.
Command:
Sudo apt-Get install sun-java6-jdk, if you are not sure whether JDK has been installed, you can use the command: Java-version to see.
If Java version is not sun or Java is not an internal command, you need to install it. Or you can download JDK and install it directly.
It is worth mentioning that, to configure environment variables, JDK is usually installed under/usr/lib/JVM/Java-6-sun by default, including executable programs and class libraries. You can run the CD/usr/lib/JVM/Java-6-sun command to check them.
I have configured two places, one is the/etc/environment file, and the other is ~ The/. bashrc file is as follows:
/Etc/environment file:
Classpath =/usr/lib/JVM/Java-6-sun/lib
Java_home =/usr/lib/JVM/Java-6-sun
~ /. Add the last line of bashrc
Export java_home =/usr/lib/JVM/Java-6-sun
Export classpath =.: $ java_home/lib/dt. jar: $ java_home/lib/tools. Jar
Export Path =.: $ path: $ java_home/bin: $ java_home/JRE/bin
Note: $ path in path must be added. Otherwise, all your commands, such as VI and sudo, cannot be found. /Sbin/must be added before execution.
In addition, the configuration delimiter in Linux is ":", which is different from ";" in windows. This is especially important for new users.
After adding these variables, you can use the echo name to check if they are correct. The command is as follows:
Echo $ path
Echo $ classpath
Echo $ java_home
You can check whether the setting is the same.
4. Configure several files in the conf file under hadoop
Conf/core-site.xml:
<configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property></configuration>
conf/hdfs-site.xml:
<configuration>
<property> <name>dfs.replication</name> <value>1</value> </property></configuration>
Conf/mapred-site.xml:
<Configuration>
<property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property></configuration>
Add JDK path to file CONF/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/jdk1.6.0_10
5. Execution Process:
Format a new distributed-filesystem:
$ bin/hadoop namenode -format
Start the hadoop daemons:
$ bin/start-all.sh
The hadoop daemon log output is written to the ${HADOOP_LOG_DIR} directory (defaults to ${HADOOP_HOME}/logs).
Browse the web interface for the NameNode and the JobTracker; by default they are available at:
- Namenode-http: // localhost: 50070/
- Jobtracker-http: // localhost: 50030/
Copy the input files into the distributed filesystem:
$ Bin/hadoop FS-put conf Input
Run some of the examples provided:
$ Bin/hadoop jar hadoop-*-examples. Jar grep input output 'dfs [A-Z.] +'
Examine the output files:
Copy the output files from the distributed filesystem to the local filesytem and examine them:
$ Bin/hadoop FS-Get output
$ Cat output /*
Or
View the output files on the distributed filesystem:
$ Bin/hadoop FS-cat output /*
When you're done, stop the daemons:
$ Bin/stop-all.sh