The previous article introduced the pseudo-distributed environment for installing Hadoop in Ubuntu systems, which is mainly for the development of the MapReduce environment.
1.HDFS Pseudo-distributed configuration
When using MapReduce, some configuration is required if you need to establish a connection to HDFs and use the files in HDFs.
First enter the installation directory for Hadoop
cd /usr/local/hadoop/hadoop2
Create a user directory in HDFs
.-mkdir-p /user/hadoop
Create the input directory and copy the XML files from the./etc/hadoop into the Distributed file system
.-mkdir.-put./etc/Hadoop/*.xml input
After the copy is complete, you can use the following command to view the file list
./bin/hdfs dfs -ls input
2. Development environment Construction
1.1 Tuning virtual machine memory to 2g+
1.2 Eclipse Linux version download
: Http://www.eclipse.org/downloads/packages/eclipse-ide-java-ee-developers/neon2
The file I downloaded is: eclipse-jee-neon-2-linux-gtk-x86_64.tar.gz
1.3 Assigning privileges to the OPT folder for Hadoop users
sudo chown hadoop /optsudo777 /opt
1.4 Copy the downloaded file to the OPT folder
1.5 Unzip (the folder named Eclipse after decompression)
-zxf eclipse-jee-neon-2-linux-gtk-x86_64.tar.gz
1.6 Download Eclispe's Hadoop plugin (Hadoop-eclipse-plugin-2.6.0.jar)
1.7 Assigning permissions to the Eclipse folder for Hadoop users
sudo chown hadoop /opt/eclipsesudo777 /opt/eclipse
Then copy the Hadoop-eclipse-plugin-2.6.0.jar to the Eclipse's plugins folder
1.8 Starting eclipse from the command line
cd /usr/local/binsudo-s /opt/eclipse/eclipse
After the setup is complete, you can start by entering Eclispe on the command line later
eclipse
Note that the selection of the working range must be selected in their own operating rights directory, such as I am/home/hadoop/workspace
1.9 After you start Eclipse, Window-show View-other will have Mapreducetools
1.10 Open the MapReduce window to start configuring the file system connection, consistent with the configuration of the Core-site.xml under/usr/local/hadoop/hadoop2/etc/hadoop/
After the configuration is complete, view the Dfs tree on the left to the following:
1.11 Ensure that all daemons are turned on
1.12 Window-preference-hadoop map/reduce Select the installation directory for Hadoop, I am here/usr/local/hadoop/hadoop2, the new Hadoop project will automatically import the required jar packages when the configuration is complete.
1.13 file-new-project, select Map/reduce Product to create a new Map/reduce project, and then create a new package under SRC and create a new WordCount test class
import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; Public class WordCount { Public Static class tokenizermapperextends Mapper<Object, Text, Text, intwritable>{ Private Final StaticIntwritable one =NewIntwritable (1);PrivateText Word =NewText (); Public voidMap (Object key, Text value, Context context) throws IOException, interruptedexception {Stringtok Enizer ITR =NewStringTokenizer (Value.tostring ()); while(Itr.hasmoretokens ()) {Word.Set(Itr.nexttoken ()); Context.write (Word, one); } } } Public Static class intsumreducerextends Reducer<Text,intwritable ,Text,intwritable> { Privateintwritable result =NewIntwritable (); Public voidReduce (Text key, iterable<intwritable> values, context context) throws IO Exception, interruptedexception {int sum =0; for(Intwritable val:values) {sum + = val.Get(); } result.Set(sum); Context.write (key, result); } } Public Static voidMain (string[] args) throws Exception {Configuration conf =NewConfiguration ();//Very importantConf.Set("Mapred.job.tracker","localhost:9001"); args =Newstring[]{"Hdfs://localhost:9000/user/hadoop/input/count_in","Hdfs://localhost:9000/user/hadoop/output/count_out"}; Job Job = job.getinstance (conf,"Word Count"); Job.setjarbyclass (WordCount.class); Job.setmapperclass (Tokenizermapper.class); Job.setcombinerclass (Intsumreducer.class); Job.setreducerclass (Intsumreducer.class); Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (intwritable.class); Fileinputformat.addinputpath (Job,NewPath (args[0])); Fileoutputformat.setoutputpath (Job,NewPath (args[1])); System.exit (Job.waitforcompletion (true) ?0:1); }}
Then copy the Log4j.properties file from the/usr/local/hadoop/hadoop2/etc/hadoop directory to the SRC directory (otherwise you will not be able to print the logs in the console)
1.14. Right-click on the input folder, create a folder –count_in, create two files on the desktop word1.txt and Word2.txt, and write some strings, such as:
Aaaa
bbbb
Cccc
Aaaa
Then right-click on the Count_in folder, select Upload file to DFS, select Word1.txt and Word2.txt, import it into the Dfs file system
1.15 Code on the mail run As–run on Hadoop Run program, after run end in Dfs folder right click Refresh, will produce output file
After the output is correct, the MapReduce development environment is built
What happened to Hadoop (ii)---mapreduce development environment