A lot of people have written about the process of installing Hadoop under Virtual Machine Linux, but in the actual installation process, there are still a lot of other people are not involved but really need to pay attention to the part, so as to ensure the operation, but also as a small summary of their own.
First, about the Linux system settings
1. Set static IP (preferably can be set to static bar, otherwise restart may have effect)
(Set static IP is to be able to write in the specific configuration of the IP, convenient for remote, if not the need for remote also does not matter)
But if you want to configure full distribution, it's best to write static IP
Systems with two versions of CentOS and Ubuntu
Then the specific settings code slightly different, specific code can be directly Google (Baidu)
2. Close the firewall
If it is pseudo distributed or stand-alone version should be no impact, but if it is in a long machine connection, or more than one virtual machine connection may cause Namenode to shut down after a period of time.
Ii. problems arising in the course of use
1 in different tutorials, you might have seen someone writing "Hadoop FS" and someone writing "Hadoop Dfs" when using Hadoop commands.
In fact, the effect is basically the same, but in the pseudo distributed 2.4.0 version of DFS will be reported warning
No intention to browse the Internet to give the explanation is: FS is a more abstract level, in the distributed environment, FS is DFS, but in the local environment, FS is the native file system, then DFS is not available.
2 use Hadoop fs-ls can query the current account of the HDFs, but sometimes there will be "." No file or directory errors
This is because the account currently in use is not a HDFS account
The solution is of course to add the current user to the HDFs account, the specific code is: Hadoop fs-mkdir-p/user/[current Login User]
3 developing Hadoop programs with eclipse
In a pseudo distributed scenario, the port number for both map/reduce (V2) master and DFS master needs to be configured as the port number configured in Core-site.xml (the default is 9000)
Three, Windows remote Ubuntu under the Hadoop environment
1. Change Ubuntu ip to static IP
The localhost configured in Core-site.xml is then written as the IP
2. After a Dfs location connection, there may be a problem with insufficient operational permissions for files on DFS: permission defined
The solution is said to have three species, but only the first two were successful in the experiment:
(1) (not recommended) if only the test environment, you can cancel the Hadoop HDFs user Rights Check, open Hdfs-site.xml, add
<property> <name>dfs.permission</name> <value>false</value> </property>
(2) In the window down with the Hadoop environment is a drwho identity access to HDFs, because the drwho user to the Hadoop directory does not write permission, so there is an exception occurred.
So you can open the Hadoop directory permissions to resolve the specific command: Hadoop fs-chmod 777xxx (XXX for the specific folder name)
(3) It is said that the Hadoop location parameter can be modified, and the Hadoop.job.ugi item will be changed to the user name of Hadoop in Eclipse's Advanced Parameter tab (never found, not known as a V2 version of the issue)
3. There are still java.lang.NullPointerException errors in submitting a task after the Linux Hadoop has operational privileges
You need to add the Winutils.exe in Hadoop-common-2.2.0-bin-master.zip to the Bin folder in Hadoop under Win and add Hadoop.dll to c:/windows/system32/ Bin, add hadoop_home to environment variables