Today, I accidentally saw the release of HADOOP3 at the end of last year, ready to install an environment today. Installation Configuration
First download the installation package from the address below http://hadoop.apache.org/releases.html
Here i download the hadoop-3.0.0.tar.gz package, unzip the installation.
$ tar zxvf hadoop-3.0.0.tar.gz
$ cd hadoop-3.0.0/
Edit the etc/hadoop/hadoop-env.sh file, set the JAVA_HOME environment variable,
Export JAVA_HOME=/OPT/JDK8
Modify configuration file Core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://
Modify the configuration file Hdfs-site.xml because it is a pseudo distributed mode, so the setting is replicated to 1.
<configuration>
<property>
<name>dfs.replication</name>
<value>1< /value>
</property>
</configuration>
Run HDFS
format HDFS
The first time you start HDFS, you need to format it.
$ Bin/hdfs Namenode-format
Start HDFS
$ sbin/start-dfs.sh
After you start HDFS, you can view the HDFS status by accessing the following address from the browser. http://localhost:9870/ run MapReduce job
First create the home directory of the current user in HDFS, as follows
$ Bin/hdfs dfs-mkdir/user
$ bin/hdfs dfs-mkdir/user/<username>
Prepare data, run tests, and view results
$ Bin/hdfs dfs-mkdir input
$ bin/hdfs dfs-put etc/hadoop/*.xml input
$ bin/hadoop jar Share/hadoop/mapreduce/had Oop-mapreduce-examples-3.0.0.jar grep input Output ' dfs[a-z.] + '
$ bin/hdfs dfs-cat output/*
Delete Test results above
$ Bin/hdfs dfs-rm output/*
$ bin/hdfs dfs-rmdir output/
Stop HDFS
$ sbin/stop-dfs.sh
Run YARN
Modifying Etc/hadoop/mapred-site.xml files
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value >yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env </name>
<value>HADOOP_MAPRED_HOME=/apps/hadoop-3.0.0</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>hadoop_mapred_home=/ apps/hadoop-3.0.0</value>
</property>
<property>
<name> mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=/apps/hadoop-3.0.0</value>
</property>
</configuration>
Modifying Etc/hadoop/yarn-site.xml files
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
< value>mapreduce_shuffle</value>
</property>
</configuration>
Start YARN
$ sbin/start-yarn.sh
After startup, you can view job requests Http://192.168.0.192:8088/cluster run MapReduce jobs with the following address
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0.jar grep input Output ' dfs[a-z.] + '
$ bin/hdfs dfs-cat output/*
Stop YARN
$ sbin/stop-yarn.sh
Http://192.168.0.192:8088/cluster
problem
In the process of testing yarn, the beginning of the total occurrence of errors similar to the following, causing the job to run failed
[2018-01-30 22:40:02.211] Container [pid=22658,containerid=container_1517369701504_0003_01_000028] is running beyond virtual memory limits. Current usage:87.9 MB of 1 GB physical memory used; 2.6 GB of 2.1 GB virtual memory used. Killing container.
Finally found that the machine is not enough memory, resulting in yarn configuration on my machine unreasonable, so modified the Etc/hadoop/yarn-site.xml file, add the following two configuration items, and then restart the yarn on it.
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</ Value>
<description>whether virtual memory limits'll is enforced for containers</description>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>4</value>
<description>ratio between virtual memory to physical memory when setting memory Limits for containers</description>
</property>
Reference: Https://stackoverflow.com/questions/21005643/container-is-running-beyond-memory-limits