spark-analyzing Apache access logs again

Source: Internet
Author: User
Tags apache log

The package that parses the log

Compile yourself:

packageApacheLogParser.jar

It's good to have simple analysis of grep for access logs, but more complex queries require spark.

Code:
Import Com.alvinalexander.accesslogparser._val p =NewAccesslogparservalLog= Sc.textfile ("Log.small")//log.count//analyze the Apache log in 404 how manyDef getstatuscode ( Line: Option[accesslogrecord]) = { LineMatch { CaseSome (L) = L.httpstatuscode CaseNone ="0"}}Log.Filter( Line= Getstatuscode (P.parserecord ( Line)) =="404"). Count/* To know which URLs are problematic, such as having a space in the URL resulting in a 404 error, the following steps are required: * filter out all 404 records * Get the request field from each 404 record (the URL string requested by the parser has spaces * * *, etc.) Do not return duplicate records * ///Get the ' request ' field from an access log recorddef getrequest (rawaccesslogstring:string): option[string] = {val accesslogrecordoption = P.parserecord ( rawaccesslogstring) accesslogrecordoption Match { CaseSome (REC) = Some (rec.request) Casenone = = None}}Log.Filter( Line= Getstatuscode (P.parserecord ( Line)) =="404"). Map (Getrequest (_)). Countval RECs =Log.Filter( Line= Getstatuscode (P.parserecord ( Line)) =="404"). Map (Getrequest (_)) Val Distinctrecs =Log.Filter( Line= Getstatuscode (P.parserecord ( Line)) =="404"). Map (Getrequest (_)). Distinctdistinctrecs.foreach (println)

It's OK! A simple example! The main use of the analysis log package! Address is: Https://github.com/jinhang/ScalaApacheAccessLogParser
Next time thank you. How to analyze logs based on the LAMDA architecture, Kafka and spark streaming for real-time analysis, offline analysis of Hadoop and Spark SQL, and MySQL for persistence of analysis results, flask Visual Web UI display. It's sleeping!

spark-analyzing Apache access logs again

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.