1.1. Foreword
Here we use the Python m/r framework mrjob to analyze.
1.2. M/R Steps
Mapper: The form of parsing the row data into Key=hh value=1
Shuffle: The result of passing the Shuffle will generate a value iterator sorted with key values
Results such as: 09 [1, 1, 1 ... 1, 1]
Reduce: We're here to figure out 09 hours of traffic.
Output such as: sum ([1, 1, 1 ...) 1, 1])
1.3. Code
Cat mr_pv_hour.py
#-*-Coding:utf-8-*-
From Mrjob.job import Mrjob
From Ng_line_parser import Nglineparser
Class Mrpvhour (Mrjob):
Ng_line_parser = Nglineparser ()
Def mapper (self, _, line):
Self.ng_line_parser.parse (line)
DY, tm = str (self.ng_line_parser.access_time). Split ()
H, m, s = Tm.split (': ')
Yield h, 1 # per hour
Yield ' total ', 1 # All
def reducer (self, Key, values):
Yield key, sum (values)
def main ():
Mrpvhour.run ()
if __name__ = = ' __main__ ':
Main ()
Run statistics and output results
Python mr_pv_hour.py < Www.ttmark.com.access.log
No Configs found; Falling back on Auto-configuration
Creating temp directory/tmp/mr_pv_hour.root.20160924.130542.359063
Running Step 1 of 1 ...
Reading from STDIN
Streaming final output from/tmp/mr_pv_hour.root.20160924.130542.359063/output ...
"00" 31539
"01" 34824
"02" 27895
"03" 29669
"04" 27742
"05" 26797
"06" 29384
"07" 31102
"08" 38257
"09" 43060
"10" 48064
"11" 57923
"12" 56413
"13" 57971
"14" 47260
"15" 46364
"16" 45721
"17" 48884
"18" 49318
"19" 49162
"20" 43641
"21" 42525
"22" 40371
"23" 34953
"Total" 988839
Removing temp directory/tmp/mr_pv_hour.root.20160924.130542.359063 ...