What about Hadoop (ii)---mapreduce development environment Building

Source: Internet
Author: User
Tags gtk hdfs dfs

The previous article introduced the pseudo-distributed environment for installing Hadoop in Ubuntu systems, which is mainly for the development of the MapReduce environment.

1.HDFS Pseudo-distributed configuration

When using MapReduce, some configuration is required if you need to establish a connection to HDFs and use the files in HDFs.
First enter the installation directory for Hadoop

cd /usr/local/hadoop/hadoop2

Create a user directory in HDFs

.-mkdir-p /user/hadoop

Create the input directory and copy the XML files from the./etc/hadoop into the Distributed file system

.-mkdir.-put./etc/Hadoop/*.xml input

After the copy is complete, you can use the following command to view the file list

./bin/hdfs dfs -ls input
2. Development environment Construction

1.1 Tuning virtual machine memory to 2g+

1.2 Eclipse Linux version download
: Http://www.eclipse.org/downloads/packages/eclipse-ide-java-ee-developers/neon2
The file I downloaded is: eclipse-jee-neon-2-linux-gtk-x86_64.tar.gz

1.3 Assigning privileges to the OPT folder for Hadoop users

sudo chown hadoop /optsudo777 /opt

1.4 Copy the downloaded file to the OPT folder
1.5 Unzip (the folder named Eclipse after decompression)

-zxf eclipse-jee-neon-2-linux-gtk-x86_64.tar.gz

1.6 Download Eclispe's Hadoop plugin (Hadoop-eclipse-plugin-2.6.0.jar)
1.7 Assigning permissions to the Eclipse folder for Hadoop users

sudo chown hadoop /opt/eclipsesudo777 /opt/eclipse

Then copy the Hadoop-eclipse-plugin-2.6.0.jar to the Eclipse's plugins folder
1.8 Starting eclipse from the command line

cd /usr/local/binsudo-s /opt/eclipse/eclipse

After the setup is complete, you can start by entering Eclispe on the command line later

eclipse

Note that the selection of the working range must be selected in their own operating rights directory, such as I am/home/hadoop/workspace

1.9 After you start Eclipse, Window-show View-other will have Mapreducetools

1.10 Open the MapReduce window to start configuring the file system connection, consistent with the configuration of the Core-site.xml under/usr/local/hadoop/hadoop2/etc/hadoop/

After the configuration is complete, view the Dfs tree on the left to the following:

1.11 Ensure that all daemons are turned on
1.12 Window-preference-hadoop map/reduce Select the installation directory for Hadoop, I am here/usr/local/hadoop/hadoop2, the new Hadoop project will automatically import the required jar packages when the configuration is complete.
1.13 file-new-project, select Map/reduce Product to create a new Map/reduce project, and then create a new package under SRC and create a new WordCount test class

 import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; Public  class WordCount {   Public Static  class tokenizermapperextends Mapper<Object, Text, Text, intwritable>{          Private Final StaticIntwritable one =NewIntwritable (1);PrivateText Word =NewText (); Public voidMap (Object key, Text value, Context context) throws IOException, interruptedexception {Stringtok Enizer ITR =NewStringTokenizer (Value.tostring ()); while(Itr.hasmoretokens ()) {Word.Set(Itr.nexttoken ());      Context.write (Word, one); }    }  } Public Static  class intsumreducerextends Reducer<Text,intwritable ,Text,intwritable> {          Privateintwritable result =NewIntwritable (); Public voidReduce (Text key, iterable<intwritable> values, context context) throws IO Exception, interruptedexception {int sum =0; for(Intwritable val:values) {sum + = val.Get(); } result.Set(sum);    Context.write (key, result); }  } Public Static voidMain (string[] args) throws Exception {Configuration conf =NewConfiguration ();//Very importantConf.Set("Mapred.job.tracker","localhost:9001"); args =Newstring[]{"Hdfs://localhost:9000/user/hadoop/input/count_in","Hdfs://localhost:9000/user/hadoop/output/count_out"}; Job Job = job.getinstance (conf,"Word Count"); Job.setjarbyclass (WordCount.class); Job.setmapperclass (Tokenizermapper.class); Job.setcombinerclass (Intsumreducer.class); Job.setreducerclass (Intsumreducer.class); Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (intwritable.class); Fileinputformat.addinputpath (Job,NewPath (args[0])); Fileoutputformat.setoutputpath (Job,NewPath (args[1])); System.exit (Job.waitforcompletion (true) ?0:1); }}

Then copy the Log4j.properties file from the/usr/local/hadoop/hadoop2/etc/hadoop directory to the SRC directory (otherwise you will not be able to print the logs in the console)

1.14. Right-click on the input folder, create a folder –count_in, create two files on the desktop word1.txt and Word2.txt, and write some strings, such as:
Aaaa
bbbb
Cccc
Aaaa
Then right-click on the Count_in folder, select Upload file to DFS, select Word1.txt and Word2.txt, import it into the Dfs file system
1.15 Code on the mail run As–run on Hadoop Run program, after run end in Dfs folder right click Refresh, will produce output file

After the output is correct, the MapReduce development environment is built

What happened to Hadoop (ii)---mapreduce development environment

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.