Writing WordCount program tasks in Python
Program |
WordCount |
Input |
A text file that contains a large number of words |
Output |
Each word in the file and the number of occurrences (frequency), sorted alphabetically by word, with each word and its frequency as a line, with intervals between words and frequencies |
- Write the map function, reduce function
- Make appropriate changes to their permissions
- Test run code on this machine
- Put it on HDFs to run
- Download and upload files to HDFs
- Submit a task with the Hadoop streaming command
# !/usr/bin/env Python2 Import for in sys.stdin: = line.strip () = line.split () for in words: print'%s\t%s' % (word,1)
#!/usr/bin/env Python2 fromoperatorImportItemgetterImportSyscurrent_word=Nonecurrent_count=0word=None forLineinchSys.stdin:line=Line.strip () Word,count=line.split ('/ T', 1) Try: Count=Int (count)exceptValueError:Continue ifCurrent_word = =Word:current_count+=CountElse: ifCurrent_word:Print(current_word,cureent_count) Current_count=Count Current_word=Wordifcurrent_word==Word:Print(Current_word,current_count)
Set permissions
CHOMD a+x mapper.py
CHOMD a+x reducer.py
Writing scripts
Uploading to HDFs
Writing WordCount program tasks in Python