Hadoop pseudo-distribution configuration and Eclipse-Based Development Environment
Directory
1. Development and configuration environment:
2. Hadoop server configuration (Master node)
3. Eclipse-based Hadoop2.x Development Environment Configuration
4. Run the Hadoop program and view the running log
1. Development and configuration environment:
Development Environment: Win7 (64bit) + Eclipse (kepler service release 2)
Configuration environment: Ubuntu Server 14.04.1 LTS (64-bit only)
Auxiliary Tools: WinSCP + Putty
Hadoop version: 2.5.0
Hadoop Eclipse development plug-in (2.x): http://pan.baidu.com/s/1dDBxUSh
JDK version on the server: OpenJDK7.0
Download and install all the above tools.
2. Hadoop server configuration (Master node)
I have been exploring the configuration of Hadoop2 recently, because Hadoop2 has made adjustments to some of the original framework APIs, but it is still compatible with the old version (including configuration ). People like me who like to use new things should have a taste of it. Now there are few configuration tutorials for new versions on the Internet, so I will share my practical experience here, if you have any errors, please correct them :).
Assume that we have successfully installed Ubuntu Server, OpenJDK, and SSH. If you have not installed Ubuntu Server, install it first. Find a tutorial on the Internet, here I will talk about the SSH password-free login settings. First pass
$ Ssh localhost
Test whether you have set password-free login. If not, the system requires you to enter the password. You can use the following settings to achieve password-free login. For details, refer to Baidu Google:
$ Ssh-keygen-t dsa-p'-f ~ /. Ssh/id_dsa
$ Cat ~ /. Ssh/id_dsa.pub> ~ /. Ssh/authorized_keys
Download the ghost package and decompress it. I put it under/usr/mywind. the full path of the Hadoop main directory is/usr/mywind/hadoop. This path is based on your personal preferences.
-------------------------------------- Split line --------------------------------------
Install Hadoop 0.20.2 in CentOS 6.4
Build a Hadoop environment on Ubuntu 13.04
Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1
Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)
Configuration of Hadoop environment in Ubuntu
Detailed tutorial on creating a Hadoop environment for standalone Edition
-------------------------------------- Split line --------------------------------------
After decompression, open the etc/hadoop/hadoop-env.sh file under the hadoop main directory, add the following content at the end:
# Set to the root of your Java installation
Export JAVA_HOME =/usr/lib/jvm/java-7-openjdk-amd64
# Assuming your installation directory is/usr/mywind/hadoop
Export HADOOP_PREFIX =/usr/mywind/hadoop
For convenience, I have added the bin directory and sbin directory of Hadoop to the environment variable. I directly modified the/etc/environment file of Ubuntu. The content is as follows:
PATH = "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games: /usr/local/games:/usr/lib/jvm/java-7-openjdk-amd64/bin:/usr/mywind/hadoop/sbin"
JAVA_HOME = "/usr/lib/jvm/java-7-openjdk-amd64"
CLASSPATH = ".: $ JAVA_HOME/jre/lib/rt. jar: $ JAVA_HOME/lib/dt. jar: $ JAVA_HOME/lib/tools. jar"
You can also modify the profile to complete this setting. If you have completed the preceding settings, you can test the Hadoop command in the command line, for example:
If you can see the above results, congratulations, Hadoop installation is complete. Next we can configure pseudo-distribution (Hadoop can run a single node in pseudo-distribution mode ).
Next we want to configure four files, respectively/usr/mywind/hadoop/etc/hadoop directory under the yarn-site.xml, mapred-site.xml, hdfs-site.xml, core-site.xml (note: in this version, there is no yarn-site.xml file by default, but there is a yarn-site.xml.properties file, the suffix can be modified to the former), about the new features of yarn can refer to the official website or this article.
First, the core-site.xml configures the HDFS address and temporary directory (the default temporary directory will be deleted after restart ):
<Configuration>
<Property>
<Name> fs. defaultFS </name>
<Value> hdfs: // 192.168.8.184: 9000 </value>
<Description> same as fs. default. name </description>
</Property>
<Property>
<Name> hadoop. tmp. dir </name>
<Value>/usr/mywind/tmp </value>
<Description> A base for other temporary directories. </description>
</Property>
</Configuration>
Then the hdfs-site.xml configures the number of clusters and some other optional configurations such as the NameNode directory, DataNode directory and so on:
<Configuration>
<Property>
<Name> dfs. namenode. name. dir </name>
<Value>/usr/mywind/name </value>
<Description> same as dfs. name. dir </description>
</Property>
<Property>
<Name> dfs. datanode. data. dir </name>
<Value>/usr/mywind/data </value>
<Description> same as dfs. data. dir </description>
</Property>
<Property>
<Name> dfs. replication </name>
<Value> 1 </value>
<Description> same as old frame, recommend set the value as the cluster DataNode host numbers! </Description>
</Property>
</Configuration>
Then the mapred-site.xml configuration enables the yarn framework:
<Configuration>
<Property>
<Name> mapreduce. framework. name </name>
<Value> yarn </value>
</Property>
</Configuration>
Finally, configure NodeManager for the yarn-site.xml:
<Configuration>
<! -- Site specific YARN configuration properties -->
<Property>
<Name> yarn. nodemanager. aux-services </name>
<Value> mapreduce_shuffle </value>
</Property>
</Configuration>
Note that the old version of the tutorial on the Internet may write the value as mapreduce. shuffle. Please note that all the file configurations have been completed. format the HDFS file system as follows:
$ Hdfs namenode-format
Enable the NameNode and DataNode processes:
$ Start-yarn.sh
Then create the hdfs file directory
$ Hdfs dfs-mkdir/user
$ Hdfs dfs-mkdir/user/a01513
Note: This a01513 is the user name on Ubuntu. It is best to keep it consistent with the System user name. It is said that there will be many permissions and other issues. I have tried to change it to another name and reported an error, it is really troublesome to change to the same as the System user name.
Then, put the input file to be tested in the file system:
$ Hdfs dfs-put/usr/mywind/psa input
The file contains data from Hadoop's typical weather example:
12345679867623119010123456798676231190101234567986762311901012345679867623119010123456 + 001212345678903456
12345679867623119010123456798676231190101234567986762311901012345679867623119010123456 + 011212345678903456
12345679867623119010123456798676231190101234567986762311901012345679867623119010123456 + 021212345678903456
12345679867623119010123456798676231190101234567986762311901012345679867623119010123456 + 003212345678903456
12345679867623119010123456798676231190201234567986762311901012345679867623119010123456 + 004212345678903456
12345679867623119010123456798676231190201234567986762311901012345679867623119010123456 + 010212345678903456
12345679867623119010123456798676231190201234567986762311901012345679867623119010123456 + 011212345678903456
12345679867623119010123456798676231190501234567986762311901012345679867623119010123456 + 041212345678903456
12345679867623119010123456798676231190501234567986762311901012345679867623119010123456 + 008212345678903456
After copying the file to the HDFS directory, you can view the relevant files and their statuses in the browser:
Http: // 192.168.8.184: 50070/
The IP address here is based on your actual Hadoop server address.
Well, all of our Hadoop background service setup and data preparation have been completed, so our M/R program will begin to be written, however, you must configure the development environment first.
For more details, please continue to read the highlights on the next page: