The deployment of Hadoop in a Windows environment

Source: Internet
Author: User
Tags error handling openssl hadoop fs

After a whole day of tossing and referring to a lot of information on the web, the Hadoop on my machine seems to have been configured successfully. Share the detailed configuration process below. Also wish everyone in the process of configuration less detours.

Note: The configuration environment for this article is:

Cygwin latest version 2.769 download address window7-64bit jdk1.6.0_31-win64 (JRE6) Download Address Eclipse-indigo ... Hadoop 0.20.2 (Note: the 0.20.203 version is not available, causing Tasktracker to fail to start) Download the address-----------------------------------------gorgeous segmentation------------------------------------Environment Installation & configuration:1.JDK, my installation directory: C/java, post-installation view

Install the JDK, and then configure the JAVA_HOME environment variables: Then append the bin directory below the JDK to the environment variable path.
2.CygWin, during the installation process, remember to select the package you want, and here's what you need: Net Category: Openssh,openssl basecategory: sed (must sed if eclipse is required) Devel Category: Subversion (recommended installation). Please refer to the following illustration:



After the installation is complete, append the Cygwin bin directory and the Usr/sbin to the system environment variable path.
3.Hadoop extract the downloaded hadoop-0.20.2.tar.gz to the specified directory. I put the unpacked Hadoop program in the Hadoop folder under the Cygwin root directory. The following diagram: (Do not use the 0.20.203 version of Hadoop ...) )

The following starts configuring Hadoop. Files to be configured: (under hadoop/conf directory) hadoop-env.sh core-site.xml hdfs-site.xml mapred-site.xml First file hadoop-env.sh
Get rid of the java_home inside, and notice that the # number in front of export will be removed. And you have to use the path representation of Linux. My JDK path is C:\JAVA\jdk1.6.0_31, and the corresponding path in Cygwin is:/cygdrive/c/java/jdk1.6.0_31
Second file: Core-site.xml delete It first, then copy the Core-default.xml file in the Hadoop/src/core directory to the Conf directory and name it core-site.xml. Then modify the Fs.default.name variable, as shown below. (Make sure that the port number (my 9100) is not occupied)

The third file: Hdfs-site.xml first delete it, and then copy the Src/hdfs directory under the Hdfs-default.xml to the Conf directory, and renamed to Hdfs-site.xml Then modify the dfs.replication variable, as shown here: The variable means the number of copies of files in the file system. When running on a separate data node, HDFs cannot replicate the block to three data nodes.

Fourth file: Mapred-site.xml first delete it, then copy src/ Mapred directory under the Mapred-default.xml to the Conf directory, and renamed to Mapred-site.xml, and then modify its Mapred.job.tracker variables: (also ensure that the port number is not occupied)


----------------------------------Gorgeous Segmentation-------------------------------------Configuring the SSH service(first confirm installed OPENSSH,OPENSSL two packages) 1. Open Cygwin Input ssh-host-config 2. System hint: Should privilege separation be used? Answer: No 3. System hint: If sshd should be installed as service? Answer: Yes 4. System hint: The value of CYGWIN environment variable input: ntsec 5. Successful Here is the diagram (I didn't record it myself, so I refer to the picture on the network)

Next, go to the Window System's service menu and open the Cygwin sshd service: As shown in the following illustration:
Go back to the Cygwin environment: Execute the following command: 1.ssh-keygen then return to 2.CD ~/.ssh 3. CP id_rsa_pub Anthorized_keys 4.exit exit Cygwin, if you do not quit, you may have an error log in again to view:
5 run SSH localhost if prompted, enter. 6 perform PS if you see a/usr/bin/ssh process, it means success
------------------------------------Gorgeous Segmentation----------------------------------start HadoopNo. 0 Step: In order to avoid jobtracker,info could is replicated to 0 Node,instead of 1 error, it is best to change the hadoop/conf and Masters files under the slaves directory to 127.0. 0.1 (The original content is: localhost)
The first step is to create a directory logs in the Hadoop directory to save the log
The second step is to format the manager, that is, Namenode, to create the HDFs execution command: Bin/hadoop Namenode-format, the following shows the success at this time, the implementation of PS, should be able to see the SSH process, and do not see the Java Virtual machine process.
Step three, start Hadoop, execute the command: bin/start-all.sh then execute the JPS command, and you may see the following figure:
Datanode, Secondarynamenode and Tasktracker are found to have not started. Some people on the internet say it is JPS, specifically not very clear, but the file system later in this article can be used. Datanode can save the data and continue to see it.
At this point, however, the PS command is being executed to see 5 JVM processes.
I do not know if this is a success, but my logs log folder there is no error.
------------------------------Gorgeous segmentation---------------------------------------File system operation to verify that HDFS can work properly we can upload files. Execute command: Bin/hadoop fs-mkdir in Bin/hadoop fs-put *.txt in the above command creates an in folder in HDFs and uploads all text files in the local Hadoop directory to HDFs, with four in the Hadoop directory TXT file:
OK, upload too little, uncomfortable, and then upload a movie. For example, I'm going to upload a video file Movie.mpg to HDFs, first, create a folder local in the Hadoop root directory, and then copy Movie.mpg to it
The following commands are executed:
Then, see if the file system has the above file:
You can see the movie.mpg in the HDFs.
You can also see Wow in eclipse:

OK, suddenly can't map, I go on to write an article. Describes common error handling.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.