Build Hadoop2.6.0 Pseudo-distributed environment under Win7 virtual machine

Source: Internet
Author: User
Tags ssh server hdfs dfs

Big data is getting hotter in recent years. Due to job needs and personal interests, we have recently started to learn about big data-related technologies. Some of the lessons learned in the learning process hope to be able to settle down through the blog, share discussions with netizens, as a personal memo.

First, the hadoop2.6.0 pseudo-distributed environment is built under the Win7 virtual machine.

1. Required Software

Use VMware 11.0 to build virtual machines and install Ubuntu 14.04.2 systems.

JDK 1.7.0_80

Hadoop 2.6.0

2. Installing VMware and Ubuntu

Slightly

3. Installing the JDK in Ubuntu

Unzip the JDK to the directory:/HOME/VM/TOOLS/JDK

Configure the environment variables in ~/.bash_profile and take effect through the source ~/.bash_profile.

#java

Export JAVA_HOME=/HOME/VM/TOOLS/JDK

Export JRE_HOME=/HOME/VM/TOOLS/JDK/JRE

Export path= $JAVA _home/bin: $JRE _home/bin: $PATH

Export classpath= $JAVA _home/lib: $JRE _home/lib: $CLASSPATH

Verify that the JDK installation is successful.

4. Configure SSH trust relationship for password-free login

4.1 Installing SSH

Ubuntu has an SSH client installed by default, but does not have an SSH server installed, so it can be installed via Apt-get.

Installing Ssh-server:sudo apt-get Install Openssh-server

If you do not have an SSH client, you can install it through apt-get.

Installing Ssh-client:sudo apt-get Install Openssh-client

Start Ssh-server:sudo service SSH start

After launch, through Ps–aux | grep sshd to see if the SSH server was successfully installed.

4.2 Configuring SSH Trust relationships

Generate a public-private key pair for machine A: ssh-keygen-t RSA, followed by a return. In the ~/.ssh directory, generate the public key id_rsa.pub, the private key Id_ras.

Copy the id_rsa.pub of machine A into the authentication file of machine B:

Cat Id_rsa.pub >> ~/.ssh/authorized_keys

At this time machine A to Machine B trust relationship is established, at this point in machine A can not require a password directly SSH login machine B.

In this case, machine A and B are the same machine, and the SSH trust relationship can be verified using ssh localhost or SSH machine IP address.

5. Installing hadoop2.6.0

5.1 Decompression hadoop2.6.0

Download hadoop-2.6.0.tar.gz from official website, unzip to directory/home/vm/tools/hadoop, and configure ~/.bash_profile environment variables. Valid through source ~/.bash_profile.

#hadoop

Export Hadoop_home=/home/vm/tools/hadoop

Export path= $HADOOP _home/bin: $PATH

Export hadoop_common_lib_native_dir= $HADOOP _home/lib/native

Export hadoop_opts= "-djava.library.path= $HADOOP _home/lib"

5.2 Modifying a configuration file

To modify $hadoop_home/etc/hadoop/hadoop-env.sh and yarn-evn.sh, configure the Java_home path:

To modify the $hadoop_home/etc/hadoop/slaves, add the native IP address:

Cat "192.168.62.129" >> Slaves

Modify several important *-site.xml under $hadoop_home/etc/hadoop/:

Core-site.xml 192.168.62.129 is the IP address of my virtual machine

<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://192.168.62.129:9000</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>file:/home/vm/app/hadoop/tmp</value>

<description>a base for other temporary directories.</description>

</property>

</configuration>

Hdfs-site.xml

<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

<property>

<name>dfs.namenode.name.dir</name>

<value>file:/home/vm/app/hadoop/dfs/nn</value>

</property>

<property>

<name>dfs.namenode.data.dir</name>

<value>file:/home/vm/app/hadoop/dfs/dn</value>

</property>

<property>

<name>dfs.permissions</name>

<value>false</value>

<description>

Permission checking is turned off

</description>

</property>

</configuration>

Mapred-site.xml

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>hdfs://192.168.62.129:9001</value>

</property>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

</configuration>

Yarn-site.xml

<configuration>

<!--Site Specific YARN Configuration Properties--

<property>

<name>yarn.nodemanager.aux-services</ name>

<value>mapreduce_shuffle </value>

</property>

</CONFIGURATION>

5.3 Format File system

execution of Bin/hdfs Namenode under $hadoop_home – format File system

5.4 Start-Stop

Execute sbin/start-dfs.sh and sbin/start-yarn.sh under $HADOOP _home to start the HADOOP cluster, execute sbin/stop-dfs.sh and sbin/ stop-yarn.sh Stop the Hadoop cluster.

The startup process, for example:

The completion process is as follows:

6. Querying cluster information

8088 ports, view all applications information:

50070 port, view hdfs information:

7. Verify that the Hadoop environment is built successfully

7.1 Verifying that HDFs is healthy

You can test with various HDFS commands. For example:

HDFs Dfs-ls./

HDFs dfs-put file.1./

HDFs Dfs-get./file1

HDFs dfs-rm-f./file.1

HDFs Dfs-cat./file1

HDFs dfs-df-h

7.2 Verifying that the Map/reduce calculation framework is normal

Executed under the $hadoop_home directory: Bin/hadoop jar./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount./count_ in/./count_out/

The./count_in/is created in the HDFs cluster ahead of time, counting the number of words for all the files in that directory and outputting them to the./count_out/directory.

Examples of execution procedures are as follows:

Completion of the build result:

At this point, Hadoop2.6.0 's pseudo-distributed environment is built.

Build Hadoop2.6.0 Pseudo-distributed environment under Win7 virtual machine

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.