Hadoop sample Program WordCount detailed and examples

Source: Internet
Author: User

1. Diagram of MapReduce

2. Resume Process:

Input:

Hello World Bye

Hello Hadoop Bye Hadoop

Bye Hadoop Hello Hadoop

MAP:

<Hello,1>

<World,1>

<Bye,1>

<World,1>

<Hello,1>

<Hadoop,1>

<Bye,1>

<Hadoop,1>

<Bye,1>

<Hadoop,1>

<Hello,1>

<Hadoop,1>

Sort:

<Bye,1>

<Bye,1>

<Bye,1>

<Hadoop,1>

<Hadoop,1>

<Hadoop,1>

<Hadoop,1>

<Hello,1>

<Hello,1>

<Hello,1>

<World,1>

<World,1>

Combine:

<Bye,1,1,1>

<Hadoop,1,1,1,1>

<Hello,1,1,1>

<World,1,1>

Reduce:

<Bye,3>

<Hadoop,4>

<Hello,3>

<World,2>

3. code example:

[C-sharp]View Plaincopy
  1. Package Com.felix;
  2. Import java.io.IOException;
  3. Import Java.util.Iterator;
  4. Import Java.util.StringTokenizer;
  5. Import Org.apache.hadoop.fs.Path;
  6. Import org.apache.hadoop.io.IntWritable;
  7. Import org.apache.hadoop.io.LongWritable;
  8. Import Org.apache.hadoop.io.Text;
  9. Import Org.apache.hadoop.mapred.FileInputFormat;
  10. Import Org.apache.hadoop.mapred.FileOutputFormat;
  11. Import org.apache.hadoop.mapred.JobClient;
  12. Import org.apache.hadoop.mapred.JobConf;
  13. Import Org.apache.hadoop.mapred.MapReduceBase;
  14. Import Org.apache.hadoop.mapred.Mapper;
  15. Import Org.apache.hadoop.mapred.OutputCollector;
  16. Import Org.apache.hadoop.mapred.Reducer;
  17. Import Org.apache.hadoop.mapred.Reporter;
  18. Import Org.apache.hadoop.mapred.TextInputFormat;
  19. Import Org.apache.hadoop.mapred.TextOutputFormat;
  20. /**
  21. *
  22. * Description: WordCount explains by Felix
  23. * @author Hadoop Dev Group
  24. */
  25. public class WordCount
  26. {
  27. /**
  28. * Mapreducebase class: Implements the base class for Mapper and Reducer interfaces (where the method simply implements the interface without doing anything)
  29. * Mapper Interface:
  30. * Writablecomparable Interface: Classes that implement writablecomparable can be compared to each other. All classes that are used as keys should implement this interface.
  31. * Reporter can be used to report the running progress of the entire application, which is not used in this example.
  32. *
  33. */
  34. public static class Map extends Mapreducebase implements
  35. mapper<longwritable, text, text, intwritable>
  36. {
  37. /**
  38. * longwritable, Intwritable, Text are classes implemented in Hadoop to encapsulate Java data types that implement the Writablecomparable interface.
  39. * Can be serialized to facilitate data exchange in a distributed environment, and you can consider them as long,int,string alternatives.
  40. */
  41. Private final static intwritable one = new intwritable (1);
  42. Private text Word = new text ();
  43. /**
  44. * The map method in the Mapper interface:
  45. * void Map (K1 key, V1 value, outputcollector<k2,v2> output, Reporter Reporter)
  46. * Map a single input k/v pair to an intermediate k/v
  47. * The output pair is not required and the input pair is the same type, the input pair can be mapped to 0 or more output pairs.
  48. * Outputcollector Interface: Collects <k,v> pairs of mapper and reducer outputs.
  49. * Outputcollector interface Collect (k, V) method: Add a (k,v) to the output
  50. */
  51. public void Map (longwritable key, Text value,
  52. Outputcollector<text, intwritable> output, Reporter Reporter)
  53. Throws IOException
  54. {
  55. String line = value.tostring ();
  56. StringTokenizer tokenizer = new StringTokenizer (line);
  57. while (Tokenizer.hasmoretokens ())
  58. {
  59. Word.set (Tokenizer.nexttoken ());
  60. Output.collect (Word, one);
  61. }
  62. }
  63. }
  64. public static class Reduce extends Mapreducebase implements
  65. Reducer<text, Intwritable, Text, intwritable>
  66. {
  67. public void reduce (Text key, iterator<intwritable> values,
  68. Outputcollector<text, intwritable> output, Reporter Reporter)
  69. Throws IOException
  70. {
  71. int sum = 0;
  72. while (Values.hasnext ())
  73. {
  74. Sum + = Values.next (). get ();
  75. }
  76. Output.collect (Key, New intwritable (sum));
  77. }
  78. }
  79. public static void Main (string[] args) throws Exception
  80. {
  81. /**
  82. * Jobconf:map/reduce job Configuration class, describing the work performed by Map-reduce to the Hadoop framework
  83. * Construction Method: jobconf (), jobconf (Class exampleclass), jobconf (Configuration conf), etc.
  84. */
  85. jobconf conf = new jobconf (wordcount.class);
  86. Conf.setjobname ("WordCount"); Set a user-defined job name
  87. Conf.setoutputkeyclass (Text.class); Set the key class for the job's output data
  88. Conf.setoutputvalueclass (Intwritable.class); Set the value class for the job output
  89. Conf.setmapperclass (Map.class); Set the Mapper class for the job
  90. Conf.setcombinerclass (Reduce.class); Set the Combiner class for the job
  91. Conf.setreducerclass (Reduce.class); To set the reduce class for a job
  92. Conf.setinputformat (Textinputformat.class); To set the InputFormat implementation class for a map-reduce task
  93. Conf.setoutputformat (Textoutputformat.class); To set the OutputFormat implementation class for a map-reduce task
  94. /**
  95. * InputFormat describes the input definition of the job in Map-reduce
  96. * Setinputpaths (): Sets the path array as an input list for the Map-reduce job
  97. * Setinputpath (): Sets the path array as the output list for the Map-reduce job
  98. */
  99. Fileinputformat.setinputpaths (conf, new Path (Args[0]));
  100. Fileoutputformat.setoutputpath (conf, new Path (Args[1]));
  101. Jobclient.runjob (conf); Run a job
  102. }
  103. }

Hadoop sample program WordCount and examples (RPM)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.