V. Analysis of the Nginx access log based on Hadoop--useragent and Spider

Source: Internet
Author: User

UserAgent:

Code (does not contain spiders):

#Cat top_10_useragent.py#!/usr/bin/env python#Coding=utf-8 fromMrjob.jobImportMrjob fromMrjob.stepImportMrstep fromNginx_accesslog_parserImportNginxlineparserImportHEAPQclassuseragent (mrjob): Nginx_line_parser=Nginxlineparser ()defMapper (self, _, line): Self.nginx_line_parser.parse (line) Field_item=self.nginx_line_parser.http_user_agentifField_item is  notNone:yieldField_item, 1defreducer_sum (self, Key, values):yieldNone, (sum (values), key)defreducer_top100 (Self, _, values): forCount, PathinchHeapq.nlargest (10, values):yieldcount, Path#for count, path in sorted (values, reverse=true) [:]:       #yield count, path    defSteps (self):return(Mrstep (Mapper=Self.mapper, Reducer=self.reducer_sum), Mrstep (reducer=self.reducer_top100))defMain (): Useragent.run ()if __name__=='__main__': Main ()

Results:

#Python3 top_10_useragent.py access_all.log-20161227No Configs found; Falling back on auto-configurationcreating Temp directory/tmp/top_10_useragent.root.20161228.090725.308144Running Step1 of 2... Running Step2 of 2... Streaming final output from/tmp/top_10_useragent.root.20161228.090725.308144/output ...85262"IE"79611"Chrome"48560" Other"10662"Firefox"7927"Mobile Safari Ui/wkwebview"7182"Sogou Explorer"6681"QQ Browser"1988"Mobile Safari"1781"Maxthon"1404"Edge"Removing temp directory/tmp/top_10_useragent.root.20161228.090725.308144 ...
Spider:
#!/usr/bin/env python#Coding=utf-8 fromMrjob.jobImportMrjob fromMrjob.stepImportMrstep fromNginx_accesslog_parserImportNginxlineparserImportHEAPQclassSpider (mrjob): Nginx_line_parser=Nginxlineparser ()defMapper (self, _, line): Self.nginx_line_parser.parse (line) Field_item=Self.nginx_line_parser.user_agent_typeifField_item is  notNone:yieldField_item, 1defreducer_sum (self, Key, values):yieldNone, (sum (values), key)defreducer_top100 (Self, _, values): forCount, PathinchHeapq.nlargest (10, values):yieldcount, Path#for count, path in sorted (values, reverse=true) [:]:       #yield count, path    defSteps (self):return(Mrstep (Mapper=Self.mapper, Reducer=self.reducer_sum), Mrstep (reducer=self.reducer_top100))defMain (): Spider.run ()if __name__=='__main__': Main ()

Execution Result:

#Python3 top_10_spider.py access_all.log-20161227No Configs found; Falling back on auto-configurationcreating Temp directory/tmp/top_10_spider.root.20161228.091326.295972Running Step1 of 2... Running Step2 of 2... Streaming final output from/tmp/top_10_spider.root.20161228.091326.295972/output ...33542"Magpie-crawler"25880" Other"16578"Sogou web Spider"6383"Bingbot"3688"Baiduspider"1487"Yahoo! slurp"1096"Jikespider"731"Yisouspider"648"Baiduspider-image"470"Googlebot"Removing temp directory/tmp/top_10_spider.root.20161228.091326.295972 ...

V. Analysis of the Nginx access log based on Hadoop--useragent and Spider

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.