Build a Hadoop development environment on Windows
Objective
There are usually two ways to run Hadoop under Windows: One is to install a Linux operating system with a VM, which basically enables Hadoop to run in a full Linux environment, and the other is to emulate the Linux environment through Cygwin. The advantage of the latter is that it is easy to use and the installation process is simple, this article is about the second way of Cygwin simulating Linux environments.
Preparatory work
(1) Install JDK1.6 or later, when installing, it is best not to install to a path with a space name, for example: programe files, otherwise you will not find the JDK when configuring Hadoop configuration files.
(2) Hadoop http://hadoop.apache.org/releases.html is downloaded from the website.
Installing Cygwin
Cygwin is a tool for simulating the UNIX environment under the Windows platform and requires the installation of Cygwin on the basis of Hadoop: http://www.cygwin.com/ Download the 32-bit or 64 installation files as required by the operating system.
First, double-click the downloaded installation file, click Next to go to the Program Boot Installation page, here are three options, select a network installation:
- Network installation: Download and install packages over the network
- Download but not install: Download packages over the network
- Local installation: Is installed with a local package
Second, choose Install from the Internet
Third, select the installation path
Third, select Local package Directory
Iv. Choose your Internet connection
V. Select the appropriate installation source and click Next
Six, this step is more important, the following package to ensure that the installation:
In the Select Packages interface, category expands to NET and selects the following OpenSSH and OpenSSL two items
If you want to compile Hadoop on eclipe, you need to install SED under category base
If you want to modify the Hadoop configuration file directly on Cygwin, you can install vim under editors
Click "Next" to wait for the installation to complete.
Eight, configure environment variables
Right click on "My Computer", select "Properties" in the menu, click on the Advanced tab on the Properties dialog, click "Environment Variables" button, double click "Path" variable in the system variable list, enter the bin directory of installed Cygwin after the variable value, for example: D:\cygwin64\bin
Installation of SSHD Services
Double-click the Cygwin icon on the desktop to start Cygwin, execute the ssh-host-config-y command, and then prompt for a password.
Enter the password and Confirm password at this time, enter. Finally, the host configuration finished appears. The fun! indicates that the installation was successful.
Enter net start sshd to start the service. Or find and start the Cygwin sshd service in the system's service.
Installing Hadoop
The previous section is operated on the company computer, the following installation operation is in native operation, the process is not affected.
Download Hadoop
Hadoop website: http://hadoop.apache.org/releases.html.
Unzip the Hadoop package to the/home/user catalog, the folder name is changed to Hadoop, can not be modified, but behind the execution of the command is a bit cumbersome.
(1) stand-alone mode configuration method
Standalone mode does not require configuration, in this way, Hadoop is considered a separate Java process, which is often used for debugging.
(2) pseudo-distribution mode
Pseudo-distribution mode can be regarded as a cluster with only one node, in this cluster, this node is both master and slave, both Namenode and Datanode, both Jobtracker and Tasktracker.
Pseudo-distribution mode only needs to modify several configuration files.
Configure hadoop-env.sh, Notepad to open the change file, set the Java_home value for your JDK installation path, for example:
Java_home= "D:\javatools\jdk1.6.0"
Configure Core-site.xml
<?xml version= "1.0"? ><?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?><!--Put Site-specific property overrides the this file. --><configuration><property> <name>fs.default.name</name> <value>hdfs:// localhost:9000</value> </property> <property> <name>mapred.child.tmp</name> <valu E>/home/u/hadoop/tmp</value> </property></configuration>
Configure Hdfs-site.xml
<?xml version= "1.0"? ><?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?><!--Put Site-specific property overrides the this file. --><configuration><property> <name>dfs.replication</name> <value>1</value ></property></configuration>
Configure Mapred-site.xml
<?xml version= "1.0"? ><?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?><!--Put Site-specific property overrides the this file. --><configuration><property> <name>mapred.job.tracker</name> <value> localhost:9001</value> </property> <property> <name>mapred.child.tmp</ name> <value>/home/u/hadoop/tmp</value> </property></configuration>
start Hadoop
open cgywin window, execute CD ~/hadoop command, Enter the Hadoop folder, Before starting Hadoop, you need to format Hadoop's file system HDFs and execute the command: Bin/hadoop Namenode-format, ( note :namenode to be smaller, otherwise if the input Namenode, will prompt the error, cannot find or cannot load the main class Namenode. )
Enter the command bin/start-all.sh to start all processes.
Verify that the installation is successful
Open the browser, enter the URL: http://localhost:50030 and then enter, if access is available, the installation succeeds. Access is as follows:
Reference Documents: the Hadoop Combat "
Build a Hadoop development environment on Windows