The package that parses the log
Compile yourself:
packageApacheLogParser.jar
It's good to have simple analysis of grep for access logs, but more complex queries require spark.
Code:
Import Com.alvinalexander.accesslogparser._val p =NewAccesslogparservalLog= Sc.textfile ("Log.small")//log.count//analyze the Apache log in 404 how manyDef getstatuscode ( Line: Option[accesslogrecord]) = { LineMatch { CaseSome (L) = L.httpstatuscode CaseNone ="0"}}Log.Filter( Line= Getstatuscode (P.parserecord ( Line)) =="404"). Count/* To know which URLs are problematic, such as having a space in the URL resulting in a 404 error, the following steps are required: * filter out all 404 records * Get the request field from each 404 record (the URL string requested by the parser has spaces * * *, etc.) Do not return duplicate records * ///Get the ' request ' field from an access log recorddef getrequest (rawaccesslogstring:string): option[string] = {val accesslogrecordoption = P.parserecord ( rawaccesslogstring) accesslogrecordoption Match { CaseSome (REC) = Some (rec.request) Casenone = = None}}Log.Filter( Line= Getstatuscode (P.parserecord ( Line)) =="404"). Map (Getrequest (_)). Countval RECs =Log.Filter( Line= Getstatuscode (P.parserecord ( Line)) =="404"). Map (Getrequest (_)) Val Distinctrecs =Log.Filter( Line= Getstatuscode (P.parserecord ( Line)) =="404"). Map (Getrequest (_)). Distinctdistinctrecs.foreach (println)
It's OK! A simple example! The main use of the analysis log package! Address is: Https://github.com/jinhang/ScalaApacheAccessLogParser
Next time thank you. How to analyze logs based on the LAMDA architecture, Kafka and spark streaming for real-time analysis, offline analysis of Hadoop and Spark SQL, and MySQL for persistence of analysis results, flask Visual Web UI display. It's sleeping!
spark-analyzing Apache access logs again