Issues to be aware of when Hadoop is installed and used in virtual machine Linux __linux

Source: Internet
Author: User
Tags hadoop fs

A lot of people have written about the process of installing Hadoop under Virtual Machine Linux, but in the actual installation process, there are still a lot of other people are not involved but really need to pay attention to the part, so as to ensure the operation, but also as a small summary of their own.

First, about the Linux system settings


1. Set static IP (preferably can be set to static bar, otherwise restart may have effect)

(Set static IP is to be able to write in the specific configuration of the IP, convenient for remote, if not the need for remote also does not matter)

But if you want to configure full distribution, it's best to write static IP

Systems with two versions of CentOS and Ubuntu

Then the specific settings code slightly different, specific code can be directly Google (Baidu)

2. Close the firewall

If it is pseudo distributed or stand-alone version should be no impact, but if it is in a long machine connection, or more than one virtual machine connection may cause Namenode to shut down after a period of time.


Ii. problems arising in the course of use


1 in different tutorials, you might have seen someone writing "Hadoop FS" and someone writing "Hadoop Dfs" when using Hadoop commands.

In fact, the effect is basically the same, but in the pseudo distributed 2.4.0 version of DFS will be reported warning

No intention to browse the Internet to give the explanation is: FS is a more abstract level, in the distributed environment, FS is DFS, but in the local environment, FS is the native file system, then DFS is not available.


2 use Hadoop fs-ls can query the current account of the HDFs, but sometimes there will be "." No file or directory errors

This is because the account currently in use is not a HDFS account

The solution is of course to add the current user to the HDFs account, the specific code is: Hadoop fs-mkdir-p/user/[current Login User]


3 developing Hadoop programs with eclipse

In a pseudo distributed scenario, the port number for both map/reduce (V2) master and DFS master needs to be configured as the port number configured in Core-site.xml (the default is 9000)



Three, Windows remote Ubuntu under the Hadoop environment


1. Change Ubuntu ip to static IP

The localhost configured in Core-site.xml is then written as the IP


2. After a Dfs location connection, there may be a problem with insufficient operational permissions for files on DFS: permission defined

The solution is said to have three species, but only the first two were successful in the experiment:

(1) (not recommended) if only the test environment, you can cancel the Hadoop HDFs user Rights Check, open Hdfs-site.xml, add


<property> <name>dfs.permission</name> <value>false</value> </property>

(2) In the window down with the Hadoop environment is a drwho identity access to HDFs, because the drwho user to the Hadoop directory does not write permission, so there is an exception occurred.

So you can open the Hadoop directory permissions to resolve the specific command: Hadoop fs-chmod 777xxx (XXX for the specific folder name)

(3) It is said that the Hadoop location parameter can be modified, and the Hadoop.job.ugi item will be changed to the user name of Hadoop in Eclipse's Advanced Parameter tab (never found, not known as a V2 version of the issue)


3. There are still java.lang.NullPointerException errors in submitting a task after the Linux Hadoop has operational privileges

You need to add the Winutils.exe in Hadoop-common-2.2.0-bin-master.zip to the Bin folder in Hadoop under Win and add Hadoop.dll to c:/windows/system32/ Bin, add hadoop_home to environment variables

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.