Hadoop + eclipse Environment Construction Process

Source: Internet
Author: User
Tags echo name ssh server hadoop fs

Http://blog.sina.com.cn/s/blog_537770820100bxmf.html

Http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/

Http://hadoop.apache.org/common/docs/current/single_node_setup.html

Hadoop standalone configuration process:

Standalone basic configuration information: ubuntu10.10 jdk1.6 hadoop-0.21.0 eclipse

1. Install the SSH server during Ubuntu installation, so that you do not have to install the SSH software in the future, because hadoop communicates with each machine through SSH. During installation, remember not to install the JDK that comes with Ubuntu, that is, virual machine host (BSD openjdk). It's useless. We need Sun JDK.

2. If you have not installed the SSH server, run the following command to install it:
Sudo apt-Get install OpenSSH-server openssh-Client
Stop SSH:/etc/init. d/ssh stop
Start SSH:/etc/init. d/ssh start
Restart SSH:/etc/init. d/ssh restart
After installing SSH, you can also use securecrt to access Ubuntu, which is easier than directly logging on to Ubuntu.

Create the. Ssh folder on each machine (after you log on with the root account, create it in the/root/. Ssh/directory ):

$ Mkdir
. SSH

Create a key pair on ubuntu01:

$ Ssh-keygen
-T RSA

A key pair (id_rsa, id_rsa.pub) can be generated by pressing the Enter key ). The key pair is in the/root/. Ssh directory. To view the key pair, select show hidden files.

Copy the id_rsa.pub generated by each machine to the authorized_keys file (the content of id_rsa.pub is a long line, so no characters are missing or redundant line breaks are mixed into the file ):

$ Cd. SSH

$ CP id_rsa.pub authorized_keys

Copy authorized_keys to the ubuntu01-ubuntu03:

$ SCP authorized_keys ubuntu02:/root/. SSH

SCP is remote copy through SSH. Enter the password of the remote host, that is, the password of the hadoop account on ubuntu02. Of course, you can also use other methods to copy the authorized_keys file to another machine.

Run the following command on each machine:

$ Chmod 640 authorized_keys

Now the SSH configuration on each machine has been completed. You can test it. For example, ubuntu01 initiates an SSH connection to ubuntu02.

$ SSH ubuntu02

If SSH is configured, the following message is displayed:

The
Authenticity of host [ubuntu02] Can't be established.

Key
Fingerprint is 1024 5f : A0: 0b: 65: D3: 82: DF: AB: 44: 62: 6d: 98: 9C : Fe: E9: 52.

Are
You sure you want to continue connecting (Yes/No )?

This is the first time you log on to this host. Type "yes ". This prompt is no longer displayed when you access this host for the second time.

3. Now you can install JDK.

Command:
Sudo apt-Get install sun-java6-jdk, if you are not sure whether JDK has been installed, you can use the command: Java-version to see.
If Java version is not sun or Java is not an internal command, you need to install it. Or you can download JDK and install it directly.

It is worth mentioning that, to configure environment variables, JDK is usually installed under/usr/lib/JVM/Java-6-sun by default, including executable programs and class libraries. You can run the CD/usr/lib/JVM/Java-6-sun command to check them.

I have configured two places, one is the/etc/environment file, and the other is ~ The/. bashrc file is as follows:
/Etc/environment file:
Classpath =/usr/lib/JVM/Java-6-sun/lib
Java_home =/usr/lib/JVM/Java-6-sun

~ /. Add the last line of bashrc
Export java_home =/usr/lib/JVM/Java-6-sun
Export classpath =.: $ java_home/lib/dt. jar: $ java_home/lib/tools. Jar
Export Path =.: $ path: $ java_home/bin: $ java_home/JRE/bin

Note: $ path in path must be added. Otherwise, all your commands, such as VI and sudo, cannot be found. /Sbin/must be added before execution.
In addition, the configuration delimiter in Linux is ":", which is different from ";" in windows. This is especially important for new users.

After adding these variables, you can use the echo name to check if they are correct. The command is as follows:
Echo $ path
Echo $ classpath
Echo $ java_home

You can check whether the setting is the same.

4. Configure several files in the conf file under hadoop

Conf/core-site.xml:

<configuration>     <property>         <name>fs.default.name</name>         <value>hdfs://localhost:9000</value>     </property></configuration>
 conf/hdfs-site.xml:
 <configuration>
     <property>         <name>dfs.replication</name>         <value>1</value>     </property></configuration>

Conf/mapred-site.xml:

<Configuration>

     <property>         <name>mapred.job.tracker</name>         <value>localhost:9001</value>     </property></configuration>
Add JDK path to file CONF/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/jdk1.6.0_10
5. Execution Process:

Format a new distributed-filesystem:
$ bin/hadoop namenode -format

Start the hadoop daemons:
$ bin/start-all.sh

The hadoop daemon log output is written to the ${HADOOP_LOG_DIR} directory (defaults to ${HADOOP_HOME}/logs).

Browse the web interface for the NameNode and the JobTracker; by default they are available at:

  • Namenode-http: // localhost: 50070/
  • Jobtracker-http: // localhost: 50030/

Copy the input files into the distributed filesystem:
$ Bin/hadoop FS-put conf Input

Run some of the examples provided:
$ Bin/hadoop jar hadoop-*-examples. Jar grep input output 'dfs [A-Z.] +'

Examine the output files:

Copy the output files from the distributed filesystem to the local filesytem and examine them:
$ Bin/hadoop FS-Get output
$ Cat output /*

Or

View the output files on the distributed filesystem:
$ Bin/hadoop FS-cat output /*

When you're done, stop the daemons:
$ Bin/stop-all.sh


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.