1. Installation
Pre-Installation Preparation:
There are three ubuntu14.04 systems equipped with OpenSSH server (you can also prepare 1 units, followed by a clone of the virtual machine, or import export). Three machines are needed here in the same network segment.
Start Installation
1 start three virtual machines, modify the hostname separately
sudo vim/etc/hostname
Named respectively as:
Hadoopmaster
HadoopSlave1
HadoopSlave2
PS: Effective after reboot 2 install JDK (3 machines installed)
Here's the Apache JDK.
sudo add-apt-repository ppa:webupd8team/java
sudo apt update
sudo apt install oracle-java7-installer
Configure environment variables after installation
sudo vim ~/.BASHRC
join the Export JAVA_HOME=JDK installation path The
JDK path installed in the above manner is:/usr/lib/jvm/java-7-oracle
export path= $JAVA _home/bin: $PATH
export classpath=.: $JAVA _home/lib/dt.jar: $JAVA _home/lib/tools.jar
source ~/. BASHRC (make configuration Effective)
3) Modify the Hosts file (3 machines are the same change)
sudo vim/etc/hosts
in the back add
10.13.7.10 hadoopmaster
10.13.7.11 HadoopSlave1
10.13.7.12 HadoopSlave2
Note: Change the IP address to its own host name corresponding to the IP
4 ssh-free login (three machines in the same operation)
The following instructions are entered on the 10.13.7.10, they are changed
Ssh-keygen (knocks in return, will prompt you to enter, all knocks the carriage return skips)
Ssh-copy-id persistence@10.13.7.10
Ssh-copy-id persistence@10.13.7.11
Ssh-copy-id persistence@10.13.7.12 (persistence is user name, followed by other machine's IP)
Three machines have to do the above, so that the three machines to avoid each other SSH 5 download hadoop2.6.0 (three machines have to do)
wget http://apache.fayea.com/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
6 extract Hadoop and configure related environment variables (three machines to do)
sudo tar-zxvf hadoop-2.6.0.tar.gz-c/usr/local (extract to/usr/local directory)
sudo mv/usr/local/hadoop-2.6.0/usr/local/ Hadoop (renaming files)
sudo chown-r persistence:persistence/usr/local/hadoop (modify files to belong to users and groups) (here's to change persistence to your own user, The following same)
/usr/local/hadoop/bin/hadoop (check if Hadoop is installed successfully)
Add the following in ~/.BASHRC (three machines are to be done)
sudo vim ~/.bashrc
export hadoop_install=/usr/local/hadoop
export path= $PATH: $HADOOP _install/bin
Export path= $PATH: $HADOOP _install/sbin
export hadoop_mapred_home= $HADOOP _install
export Hadoop_common_ Home= $HADOOP _install
export hadoop_hdfs_home= $HADOOP _install
export yarn_home= $HADOOP _install
SOURCE ~/.BASHRC
Validation: Enter HDFs, if you see a hint, installation success 7 Create a directory of Hadoop needs (three machines do)
sudo mkdir/home/hadoop
sudo chown-r persistence:persistence/home/hadoop
mkdir/home/hadoop/hadoop-2.6.0
mkdir/home/hadoop/hadoop-2.6.0/tmp
Mkdir/home/hadoop/hadoop-2.6.0/dfs
mkdir/home/hadoop/ Hadoop-2.6.0/dfs/name
Mkdir/home/hadoop/hadoop-2.6.0/dfs/data
8 Modify the configuration file (important, do not make mistakes) (three machines have to do)
①
vim/usr/local/hadoop/etc/hadoop/hadoop-env.sh
Add Export java_home=/usr/lib/jvm/java-7-oracle
Ii
Vim/usr/local/hadoop/etc/hadoop/core-site.xml
Add the following content to <configuration></configuration>
< property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoop-2.6.0/tmp</ Value> <description>abase for the other
temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoopmaster:9000 </value>
</property>
③
Vim/usr/local/hadoop/etc/hadoop/hdfs-site.xml
Add the following content to <configuration></configuration>
< property>
<name>dfs.name.dir</name>
<value>/home/hadoop/hadoop-2.6.0/dfs/name< /value> <description>path on the local
filesystem where the Namenode stores the namespace and transactions Lo GS persistently.</description>
</property>
<property>
<name>dfs.data.dir </name>
<value>/home/hadoop/hadoop-2.6.0/dfs/data</value>
<description>comma Separated list of paths on the local filesystem to a datanode where it should store its blocks.</description>
& lt;/property>
<property>
<name>dfs.replication</name>
<value>1</ Value>
</property>
④
Vim/usr/local/hadoop/etc/hadoop/mapred-site.xml.template
in <configuration></configuration> Add the following
<property>
<name>mapred.job.tracker</name>
<value>hadoopmaster :9001</value>
<description>host or IP and port of jobtracker.</description>
</property >
cp/usr/local/hadoop/etc/hadoop/mapred-site.xml.template/usr/local/hadoop/etc/hadoop/mapred-site.xml
⑤
Vim/usr/local/hadoop/etc/hadoop/slaves
will localhost deleted, add the following content
HadoopSlave1
HadoopSlave2
⑥
Vim/usr/local/hadoop/etc/hadoop/masters
Add the following content
Hadoopmaster
9 Format HDFs file system Namenode (three machines have to do)
Cd/usr/local/hadoop && Bin/hdfs Namenode-format
10 Start the Hadoop cluster (note: This step is only done on the Hadoopmaster)
/USR/LOCAL/HADOOP/SBIN/START-DFS.SH//This is the boot
/usr/local/hadoop/sbin/stop-dfs.sh //This is off
Perform JPS view output after startup completes
If there are three processes in master, slave has two processes, that is, startup is successful
The above is the installation configuration Hadoop content.
Can pass through the Hadoopmaster ip:8088
And Hadoopmaster's ip:50070 view Hadoop information
Below and HDFs a few simple operations (are executed on the Hadoopmaster)
Hadoop fs-mkdir/input/--> Create folders on Hadoop
fs-rmdir/input/ --> Create folders on Hadoop Hadoop
fs-ls/--& gt; View files on Hadoop/directory
hadoop fs-rm/test.txt--> Delete file
Hadoop fs-put test.txt/--> upload file Test.txt to hadoop/directory
hadoop fs-get/test.txt--> downloads files from Hadoop to current directory
2. Simple application – Count the number of words
1 Ensure that the Hadoop cluster is launched
/usr/local/hadoop/sbin/start-dfs.sh
/usr/local/hadoop/sbin/start-yarn.sh
2) Writing Java code
Cd/home/hadoop && mkdir Example CD example && mkdir word_count_class jar vim Wordcount.java content as follows import J
Ava.io.IOException;
Import Java.util.StringTokenizer;
Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.Path;
Import org.apache.hadoop.io.IntWritable;
Import org.apache.hadoop.io.LongWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Job;
Import Org.apache.hadoop.mapreduce.Mapper;
Import Org.apache.hadoop.mapreduce.Reducer;
Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
Import Org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
Import Org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; public class WordCount {public static class Wordcountmap extends Mapper<longwritable,
Text, text, intwritable> {private final intwritable one = new intwritable (1); Private Text Word = new Text (); public void Map (longwritable key, Text value, Context context) throws IOException, Interru
ptedexception {String line = value.tostring ();
StringTokenizer token = new StringTokenizer (line);
while (Token.hasmoretokens ()) {Word.set (Token.nexttoken ());
Context.write (Word, one); }} public static class Wordcountreduce extends Reducer<text
, intwritable, text, intwritable> {public void reduce (text key, iterable<intwritable> values, Context context) throws IOException, interruptedexception {int sum
= 0;
for (intwritable val:values) {sum + = Val.get (); } context.write (Key, New intwritable (sum)); } public static void Main (string[] args) throws Exception {Configuration conf = new Co
Nfiguration ();
Job Job = new Job (conf);
Job.setjarbyclass (Wordcount.class);
Job.setjobname ("WordCount");
Job.setoutputkeyclass (Text.class);
Job.setoutputvalueclass (Intwritable.class);
Job.setmapperclass (Wordcountmap.class);
Job.setreducerclass (Wordcountreduce.class);
Job.setinputformatclass (Textinputformat.class);
Job.setoutputformatclass (Textoutputformat.class);
Fileinputformat.addinputpath (Job, New Path (args[0));
Fileoutputformat.setoutputpath (Job, New Path (args[1));
Job.waitforcompletion (TRUE);
}
}
3 Download jar package, concurrent in/home/hadoop/example/jar directory
Download Link Common Package
Download link MapReduce
Download to local, upload to/home/hadoop/example/jar directory 4) compile run
javac-classpath/home/hadoop/example/jar/hadoop-common-2.6.0.2.2.9.9-2.jar:/home/hadoop/example/jar/ hadoop-mapreduce-client-core-2.6.0.2.2.9.9-2.jar-d Word_count_class Wordcount.java (compiled)
CD Word_count_class
JAR-CVF Wordcount.jar *.class (Pack)
cd/home/hadoop/example
set up two files of its own to name File1,file2. and add some words
to yourself. Hadoop fs-mkdir/input/
Hadoop fs-put file*/input/
hadoop jar Word_count_class/wordcount.jar wordcount/input/ Output
You can see the word statistics when you finish the execution.
Hadoop fs-ls/output (Output results in these three directories, we want the results in part-r-00000)
Hadoop fs-cat/output/part-r-00000
Over,thanks.