One: First write the Map class
Import sysfor line in sys.stdin:line = Line.strip () words = Line.split () for Word in Words:print ('%s\t%s '% (word, 1))
Two: Write the Reduce class
Import Syscurrent_word = Nonecurrent_count = 0word = Nonefor line in sys.stdin:line = Line.strip () word, Count = Line.split (' \ t ', 1) try:count = Int (count) except valueerror:continueif Current_word = = Word:current_count + countelse:if Current_ Word:print ('%s\t%s '% (current_word,current_count)) Current_count = Countcurrent_word = wordif Current_word = = Word: Print ('%s\t%s '% (current_word,current_count))
Three: Use Hadoop streaming to execute Python content.
Hadoop jar/home/hadoop/hadoop-2.6.0-cdh5.5.2/share/hadoop/tools/lib/hadoop-streaming-2.6.0-cdh5.5.2.jar-input/ User/hadoop/aa.txt-output/user/hadoop/python_output -mapper "python mapper.py" -reducer " Python reducer.py " -file mapper.py -file reducer.py
Description
The input and output paths, which are themselves on HDFs, do not require special HDFS designation.
If the quotation marks are not in the yellow part, the error is reported:
Error:java.lang.RuntimeException:PipeMapRed.waitOutputThreads (): Subprocess failed with code 2
If you do not add the contents of the pink part, the error will be reported:
Error:java.lang.RuntimeException:Error in configuring Object
This article is from the "Vernacular" blog, please be sure to keep this source http://feature09.blog.51cto.com/12614993/1970964
Streaming Execute Python version wordcount