What about Hadoop (ii)---mapreduce development environment Building

Last Update:2017-01-20 Source: Internet

Author: User

Tags gtk hdfs dfs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The previous article introduced the pseudo-distributed environment for installing Hadoop in Ubuntu systems, which is mainly for the development of the MapReduce environment.

1.HDFS Pseudo-distributed configuration

When using MapReduce, some configuration is required if you need to establish a connection to HDFs and use the files in HDFs.
First enter the installation directory for Hadoop

cd /usr/local/hadoop/hadoop2

Create a user directory in HDFs

.-mkdir-p /user/hadoop

Create the input directory and copy the XML files from the./etc/hadoop into the Distributed file system

.-mkdir.-put./etc/Hadoop/*.xml input

After the copy is complete, you can use the following command to view the file list

./bin/hdfs dfs -ls input

2. Development environment Construction

1.1 Tuning virtual machine memory to 2g+

1.2 Eclipse Linux version download
: Http://www.eclipse.org/downloads/packages/eclipse-ide-java-ee-developers/neon2
The file I downloaded is: eclipse-jee-neon-2-linux-gtk-x86_64.tar.gz

1.3 Assigning privileges to the OPT folder for Hadoop users

sudo chown hadoop /optsudo777 /opt

1.4 Copy the downloaded file to the OPT folder
1.5 Unzip (the folder named Eclipse after decompression)

-zxf eclipse-jee-neon-2-linux-gtk-x86_64.tar.gz

1.6 Download Eclispe's Hadoop plugin (Hadoop-eclipse-plugin-2.6.0.jar)
1.7 Assigning permissions to the Eclipse folder for Hadoop users

sudo chown hadoop /opt/eclipsesudo777 /opt/eclipse

Then copy the Hadoop-eclipse-plugin-2.6.0.jar to the Eclipse's plugins folder
1.8 Starting eclipse from the command line

cd /usr/local/binsudo-s /opt/eclipse/eclipse

After the setup is complete, you can start by entering Eclispe on the command line later

eclipse

Note that the selection of the working range must be selected in their own operating rights directory, such as I am/home/hadoop/workspace

1.9 After you start Eclipse, Window-show View-other will have Mapreducetools

1.10 Open the MapReduce window to start configuring the file system connection, consistent with the configuration of the Core-site.xml under/usr/local/hadoop/hadoop2/etc/hadoop/

After the configuration is complete, view the Dfs tree on the left to the following:

1.11 Ensure that all daemons are turned on
1.12 Window-preference-hadoop map/reduce Select the installation directory for Hadoop, I am here/usr/local/hadoop/hadoop2, the new Hadoop project will automatically import the required jar packages when the configuration is complete.
1.13 file-new-project, select Map/reduce Product to create a new Map/reduce project, and then create a new package under SRC and create a new WordCount test class

 import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; Public  class WordCount {   Public Static  class tokenizermapperextends Mapper<Object, Text, Text, intwritable>{          Private Final StaticIntwritable one =NewIntwritable (1);PrivateText Word =NewText (); Public voidMap (Object key, Text value, Context context) throws IOException, interruptedexception {Stringtok Enizer ITR =NewStringTokenizer (Value.tostring ()); while(Itr.hasmoretokens ()) {Word.Set(Itr.nexttoken ());      Context.write (Word, one); }    }  } Public Static  class intsumreducerextends Reducer<Text,intwritable ,Text,intwritable> {          Privateintwritable result =NewIntwritable (); Public voidReduce (Text key, iterable<intwritable> values, context context) throws IO Exception, interruptedexception {int sum =0; for(Intwritable val:values) {sum + = val.Get(); } result.Set(sum);    Context.write (key, result); }  } Public Static voidMain (string[] args) throws Exception {Configuration conf =NewConfiguration ();//Very importantConf.Set("Mapred.job.tracker","localhost:9001"); args =Newstring[]{"Hdfs://localhost:9000/user/hadoop/input/count_in","Hdfs://localhost:9000/user/hadoop/output/count_out"}; Job Job = job.getinstance (conf,"Word Count"); Job.setjarbyclass (WordCount.class); Job.setmapperclass (Tokenizermapper.class); Job.setcombinerclass (Intsumreducer.class); Job.setreducerclass (Intsumreducer.class); Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (intwritable.class); Fileinputformat.addinputpath (Job,NewPath (args[0])); Fileoutputformat.setoutputpath (Job,NewPath (args[1])); System.exit (Job.waitforcompletion (true) ?0:1); }}

Then copy the Log4j.properties file from the/usr/local/hadoop/hadoop2/etc/hadoop directory to the SRC directory (otherwise you will not be able to print the logs in the console)

1.14. Right-click on the input folder, create a folder –count_in, create two files on the desktop word1.txt and Word2.txt, and write some strings, such as:
Aaaa
bbbb
Cccc
Aaaa
Then right-click on the Count_in folder, select Upload file to DFS, select Word1.txt and Word2.txt, import it into the Dfs file system
1.15 Code on the mail run As–run on Hadoop Run program, after run end in Dfs folder right click Refresh, will produce output file

After the output is correct, the MapReduce development environment is built

What happened to Hadoop (ii)---mapreduce development environment

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More