Build a Hadoop development environment on Windows

Source: Internet
Author: User
Tags xsl

Build a Hadoop development environment on Windows



Objective

There are usually two ways to run Hadoop under Windows: One is to install a Linux operating system with a VM, which basically enables Hadoop to run in a full Linux environment, and the other is to emulate the Linux environment through Cygwin. The advantage of the latter is that it is easy to use and the installation process is simple, this article is about the second way of Cygwin simulating Linux environments.

Preparatory work


(1) Install JDK1.6 or later, when installing, it is best not to install to a path with a space name, for example: programe files, otherwise you will not find the JDK when configuring Hadoop configuration files.

(2) Hadoop http://hadoop.apache.org/releases.html is downloaded from the website.

Installing Cygwin

Cygwin is a tool for simulating the UNIX environment under the Windows platform and requires the installation of Cygwin on the basis of Hadoop: http://www.cygwin.com/ Download the 32-bit or 64 installation files as required by the operating system.

First, double-click the downloaded installation file, click Next to go to the Program Boot Installation page, here are three options, select a network installation:

    • Network installation: Download and install packages over the network
    • Download but not install: Download packages over the network
    • Local installation: Is installed with a local package

Second, choose Install from the Internet

Third, select the installation path

Third, select Local package Directory

Iv. Choose your Internet connection

V. Select the appropriate installation source and click Next


Six, this step is more important, the following package to ensure that the installation:



In the Select Packages interface, category expands to NET and selects the following OpenSSH and OpenSSL two items

  

If you want to compile Hadoop on eclipe, you need to install SED under category base

  

If you want to modify the Hadoop configuration file directly on Cygwin, you can install vim under editors

  

Click "Next" to wait for the installation to complete.

Eight, configure environment variables

Right click on "My Computer", select "Properties" in the menu, click on the Advanced tab on the Properties dialog, click "Environment Variables" button, double click "Path" variable in the system variable list, enter the bin directory of installed Cygwin after the variable value, for example: D:\cygwin64\bin

Installation of SSHD Services

Double-click the Cygwin icon on the desktop to start Cygwin, execute the ssh-host-config-y command, and then prompt for a password.


Enter the password and Confirm password at this time, enter. Finally, the host configuration finished appears. The fun! indicates that the installation was successful.


Enter net start sshd to start the service. Or find and start the Cygwin sshd service in the system's service.



Installing Hadoop
The previous section is operated on the company computer, the following installation operation is in native operation, the process is not affected.


Download Hadoop

Hadoop website: http://hadoop.apache.org/releases.html.


Unzip the Hadoop package to the/home/user catalog, the folder name is changed to Hadoop, can not be modified, but behind the execution of the command is a bit cumbersome.


(1) stand-alone mode configuration method

Standalone mode does not require configuration, in this way, Hadoop is considered a separate Java process, which is often used for debugging.

(2) pseudo-distribution mode

Pseudo-distribution mode can be regarded as a cluster with only one node, in this cluster, this node is both master and slave, both Namenode and Datanode, both Jobtracker and Tasktracker.


Pseudo-distribution mode only needs to modify several configuration files.

Configure hadoop-env.sh, Notepad to open the change file, set the Java_home value for your JDK installation path, for example:

Java_home= "D:\javatools\jdk1.6.0"


Configure Core-site.xml

<?xml version= "1.0"? ><?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?><!--Put Site-specific property overrides the this file. --><configuration><property> <name>fs.default.name</name> <value>hdfs:// localhost:9000</value> </property> <property> <name>mapred.child.tmp</name> <valu E>/home/u/hadoop/tmp</value> </property></configuration>

Configure Hdfs-site.xml



<?xml version= "1.0"? ><?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?><!--Put Site-specific property overrides the this file. --><configuration><property> <name>dfs.replication</name> <value>1</value ></property></configuration>

Configure Mapred-site.xml

<?xml version= "1.0"? ><?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?><!--Put Site-specific property overrides the this file. --><configuration><property>   <name>mapred.job.tracker</name>   <value> localhost:9001</value>    </property>    <property>   <name>mapred.child.tmp</ name>   <value>/home/u/hadoop/tmp</value>    </property></configuration>

start Hadoop


open cgywin window, execute CD ~/hadoop command, Enter the Hadoop folder, Before starting Hadoop, you need to format Hadoop's file system HDFs and execute the command: Bin/hadoop Namenode-format, ( note :namenode to be smaller, otherwise if the input Namenode, will prompt the error, cannot find or cannot load the main class Namenode. )





Enter the command bin/start-all.sh to start all processes.




Verify that the installation is successful


Open the browser, enter the URL: http://localhost:50030 and then enter, if access is available, the installation succeeds. Access is as follows:





Reference Documents: the Hadoop Combat "





Build a Hadoop development environment on Windows

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.