1. First install the JDK and configure the Java environment variables (specific methods can be found in google)
Unzip the hadoop-0.20.2.tar.gz into your Ubuntu account directory (/home/xxxx/hadoop) (unzip to any directory can be, see the individual needs, but the configuration of the following files must be changed to their own path)
Modify the Core-site.xml,hadoop-env,sh,hdfs-site.xml,mapred-site.xml under the Conf folder under Hadoop
Core-site.xml
<Configuration> < Property> <name>Fs.default.name</name> <value>hdfs://localhost:9000</value> </ Property> < Property> <name>Hadoop.tmp.dir</name> <value>/home/xxxx/hadoop/tmp</value> </ Property></Configuration>
hadoop-env.sh
Add your java-home variable to hadoop-env.sh, my:
Export java_home=/usr/java/jdk1.6.0_27
This one, don't forget to add
Hdfs-site.xml
<Configuration> < Property> <name>Dfs.replication</name> <value>1</value> </ Property> < Property> <name>Dfs.name.dir</name> <value>/home/xxxx/hadoop/hdfs/name</value> </ Property> < Property> <name>Dfs.data.dir</name> <value>/home/xxxx/hadoop/hdfs/data</value> </ Property></Configuration>
Mapred-site.xml:
<Configuration> < Property> <name>Mapred.job.tracker</name> <value>localhost:9001</value> </ Property></Configuration>
Note that the folders above do not need to be created yourself when you first run Hadoop, Hadoop will automatically help you create
2. Configure SSH
(Referencing the contents of a document in Hadoop)
Note that Ubuntu is not installed by default and requires SSH
Setup passphraseless SSH
Now check this can ssh to the localhost without a passphrase:
$ ssh localhost (you can use this command to test if SSH is installed on your machine)
If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen-t Dsa-p "-F ~/.SSH/ID_DSA
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Configure SSH password-free login with the above two commands
Note To execute the above two commands in the directory of your account's home folder (regardless of which folder in the current terminal the direct Input CD command can enter your home folder)
Re-enter SSH localhost and no password is needed.
3. First time implementation
Enter the directory for Hadoop
Format a new Distributed-filesystem:
$ bin/hadoop Namenode-format
Start the Hadoop daemons:
$ bin/start-all.sh
List all processes with the JPS command to see if they are running successfully
This will run successfully, if less one daemon represents a configuration error, you can see your log output to see what the error
The following excerpt from Hadoop document, it is relatively simple to translate
The Hadoop daemon log output is written to the ${hadoop_log_dir} directory (defaults to${hadoop_home}/logs).
Browse the Web interface for the NameNode and the Jobtracker; By default they is available at:
- namenode-http://localhost:50070/
- jobtracker-http://localhost:50030/
Copy the input files into the distributed filesystem:
$ bin/hadoop fs-put conf input
Run Some of the examples provided:
$ bin/hadoop jar hadoop-*-examples.jar grep input Output ' dfs[a-z. +
Examine the output files:
Copy the output files from the distributed filesystem to the local filesytem and examine them:
$ bin/hadoop fs-get Output output
$ cat output/*
Or
View the output files on the distributed filesystem:
$ bin/hadoop Fs-cat output/*
When you ' re done, stop the daemons with:
$ bin/stop-all.sh
Reference:
Http://www.cnblogs.com/welbeckxu/archive/2011/12/29/2306757.html (Core-site.xml,hdfs-site.xml, in the/home/when I do it. Several files, such as xxxx/hadoop/tmp, do not have to be created, and if they are created, they will produce an error. Migrated from the CSDN.
The Hadoop 0.20.2 pseudo-distributed configuration on Ubuntu