Build Hadoop2.6.0 pseudo distributed environment under Windows 7 Virtual Machine

Source: Internet
Author: User
Tags ssh server hdfs dfs

Build Hadoop2.6.0 pseudo distributed environment under Windows 7 Virtual Machine

In recent years, big data has become increasingly popular. Due to work needs and personal interests, I recently started to learn big data-related technologies. Some lessons learned during the learning process are expected to be accumulated through blog posts and shared discussions with netizens as a personal memo.

Article 1: Build the Hadoop2.6.0 pseudo distributed environment under the Win7 virtual machine.

1. Required Software

Use VMware 11.0 to build a virtual machine and install Ubuntu 14.04.2.

Jdk 1.7.0 _ 80

Hadoop 2.6.0

2. Install VMware and Ubuntu

Install Ubuntu 14.04 on a VMware Workstation 10 Virtual Machine in Windows 7

3. Install JDK in Ubuntu

Decompress jdk to the directory:/home/vm/tools/jdk

In ~ /. Configure the environment variables in bash_profile and use source ~ /. Bash_profile takes effect.

# Java

Export JAVA_HOME =/home/vm/tools/jdk

Export JRE_HOME =/home/vm/tools/jdk/jre

Export PATH = $ JAVA_HOME/bin: $ JRE_HOME/bin: $ PATH

Export CLASSPATH = $ JAVA_HOME/lib: $ JRE_HOME/lib: $ CLASSPATH

Check whether jdk is successfully installed.

4. Configure the ssh trust relationship to achieve password-less Login

4.1 Install ssh

The ssh client is installed in Ubuntu by default, but the ssh server is not installed, so you can install it through apt-get.

Install ssh-server: sudo apt-get install openssh-server

If you do not have an ssh client, you can also install it through apt-get.

Install ssh-client: sudo apt-get install openssh-client

Start ssh-server: sudo service ssh start

After startup, run ps-aux | grep sshd to check whether the ssh server is successfully installed.

SSH service remote access to Linux Server login is slow

How to Improve the SSH login authentication speed of Ubuntu

Enable the SSH service to allow Android phones to remotely access Ubuntu 14.04

How to add dual authentication for SSH in Linux

Configure the SFTP environment for non-SSH users in Linux

Configure and manage the SSH service on Linux

Basic SSH tutorial

SSH password-free logon details

4.2 configure the ssh Trust Relationship

Generate A public/private key pair for machine A: ssh-keygen-t rsa, and press Enter. In ~ /. Generate the Public Key id_rsa.pub and private key id_ras In the ssh directory.

Copy id_rsa.pub of machine A to the authentication file of machine B:

Cat id_rsa.pub >> ~ /. Ssh/authorized_keys

At this time, the trust relationship between machine A and machine B is established. At this time, machine A can directly log on to machine B through ssh without the password.

In this example, machine A and machine B are the same machine. After configuring the ssh trust relationship, you can use ssh localhost or the IP address of the ssh machine for verification.

5. Install Hadoop2.6.0

5.1 decompress Hadoop2.6.0

Download hadoop-2.6.0.tar.gz from the official network, decompress it to the/home/vm/tools/hadoop directory, and configure ~ /. Bash_profile environment variable. Use source ~ /. Bash_profile takes effect.

# Hadoop

Export HADOOP_HOME =/home/vm/tools/hadoop

Export PATH = $ HADOOP_HOME/bin: $ PATH

Export HADOOP_COMMON_LIB_NATIVE_DIR = $ HADOOP_HOME/lib/native

Export HADOOP_OPTS = "-Djava. library. path = $ HADOOP_HOME/lib"

5.2 modify the configuration file

Modify $ HADOOP_HOME/etc/hadoop/hadoop-env.sh and yarn-evn.sh to configure the JAVA_HOME path:

Modify $ HADOOP_HOME/etc/hadoop/slaves and add the local IP Address:

Cat "192.168.62.129"> slaves

Modify several important *-site. xml files under $ HADOOP_HOME/etc/hadoop:

Core-site.xml 192.168.62.129 is the IP address of my VM

<Configuration>

<Property>

<Name> fs. defaultFS </name>

<Value> hdfs: // 192.168.62.129: 9000 </value>

</Property>

<Property>

<Name> hadoop. tmp. dir </name>

<Value> file:/home/vm/app/hadoop/tmp </value>

<Description> a base for other temporary directories. </description>

</Property>

</Configuration>

Hdfs-site.xml

<Configuration>

<Property>

<Name> dfs. replication </name>

<Value> 1 </value>

</Property>

<Property>

<Name> dfs. namenode. name. dir </name>

<Value> file:/home/vm/app/hadoop/dfs/nn </value>

</Property>

<Property>

<Name> dfs. namenode. data. dir </name>

<Value> file:/home/vm/app/hadoop/dfs/dn </value>

</Property>

<Property>

<Name> dfs. permissions </name>

<Value> false </value>

<Description>

Permission checking is turned off

</Description>

</Property>

</Configuration>

Mapred-site.xml

<Configuration>

<Property>

<Name> mapred. job. tracker </name>

<Value> hdfs: // 192.168.62.129: 9001 </value>

</Property>

<Property>

<Name> mapreduce. framework. name </name>

<Value> yarn </value>

</Property>

</Configuration>

Yarn-site.xml

<Configuration>

<! -- Site specific YARN configuration properties -->

<Property>

<Name> yarn. nodemanager. aux-services </name>

<Value> mapreduce_shuffle </value>

</Property>

</Configuration>

5.3 format a File System

Run bin/hdfs namenode-format in $ HADOOP_HOME to format the File System

5.4 start/stop

Run sbin/start-dfs.sh and sbin/start-yarn.sh under $ HADOOP_HOME to start the hadoop cluster and run sbin/stop-dfs.sh and sbin/stop-yarn.sh to stop the hadoop cluster.

The startup process is as follows:

The startup process is as follows:

6. query cluster information

Port 8088, view All Applications information:

Port 50070 to view hdfs information:

7. Verify that the hadoop environment is successfully built

7.1 verify that hdfs is normal

You can use various hdfs commands for testing. For example:

Hdfs dfs-ls ./

Hdfs dfs-put file.1 ./

Hdfs dfs-get./file1

Hdfs dfs-rm-f./file.1

Hdfs dfs-cat./file1

Hdfs dfs-df-h

7.2 verify that the map/reduce computing framework is normal

Run: bin/hadoop jar./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount./count_in/./count_out/in the $ HADOOP_HOME directory/

Where./count_in/is created in the hdfs cluster in advance. count the number of words in all files in the directory and output them to the./count_out/directory.

The execution process is as follows:

Result generated after execution:

So far, the establishment of the pseudo-distributed environment of Hadoop2.6.0 has been completed.

Tutorial on standalone/pseudo-distributed installation and configuration of Hadoop2.4.1 under Ubuntu14.04

Install and configure Hadoop2.2.0 on CentOS

Build a Hadoop environment on Ubuntu 13.04

Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1

Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)

Configuration of Hadoop environment in Ubuntu

Detailed tutorial on creating a Hadoop environment for standalone Edition

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.