Spark Learning four: website log analysis case
tags (space delimited): Spark
- Spark Learning four site log analysis case
- Create a MAVEN project
- Two Create a template
- 3rd Log Analysis Case
One, create a MAVEN project
1. Execute the MAVEN command to create the project
mvn archetype:generate -DarchetypeGroupId=org.scala-tools.archetypes -DarchetypeArtifactId=scala-archetype-simple -DremoteRepositories=http://scala-tools.org/repo-releases -DgroupId=com.ibeifeng.bigdata.spark.app -DartifactId=analyzer-logs -Dversion=1.0
2,idea Importing MAVEN Projects
Second, create a template
#if ((${package_name} && ${package_name}! = Span class= "hljs-string" "" ")) package ${package_name} #end import org.apache.spark.sparkcontextimport Org.apache.spark.sparkconf#parse ( "File Header.java" ) < Span class= "Hljs-keyword" >object ${ NAME} {def Main (args:array[ String] {//create sparkconf val sparkconf= new sparkconf (). Setappname ( "Test" ). Setmaster ( Span class= "hljs-string" > "local[2]" ) //create SC val sc= new sparkcontext (sparkconf) Sc.stop ()}}
Third, log analysis case
1, prepare the data
2, complete the code
Apacheaccesslog.scala
PackageCom.ibeifeng.bigdata.spark.app/** * Created by hadoop001 on 4/27/16. * *Case class apacheaccesslog(ipaddress:string, clientidentd:string, UserId: String, Datatime:string, method:string, endpoint:string, protocol:string, Responsecode:int, Contentsize:long) {} Object apacheaccesslog{ //Regex //64.242.88.10--[07/mar/2004:16:05:49-0800] //"Get/twiki/bin/edit/main/double_bounce_sender?topicparent=main.configurationvariables HTTP/1.1" //401 12846 ValParttern ="" ^ (\s+) (\s+) (\s+) \[([\w:/]+\s[+|-]\d{4}) \] "( \s+) (\s+) (\s+)" (\d{3}) (\d+) "" ". RdefIsvalidatelogline (log:string): boolean={ValRes=parttern.findfirstmatchin (log)if(Res.isempty) {false}Else{true} }defParselogline (log:string): apacheaccesslog={ValRes=parttern.findfirstmatchin (log)if(Res.isempty) {Throw NewRuntimeException ("Cannot parse log line:"+ log)}ValM=res.get Apacheaccesslog (M.group (1), M.group (2), M.group (3), M.group (4), M.group (5), M.group (6), M.group (7), M.group (8). ToInt, M.group (9). Tolong)}}
Orderingutils.scala
package Com.ibeifeng.bigdata.spark.app/** * Created by hadoop001 on 4/28/16. */ object orderingutils { object secondvalueordering extends ordering [ (String, Int) ] { /** Returns an integer whose sign communicates how x compares to Y. * The result sign has the following meaning: * *-negative if x < Y *-positive if x > y *-Zero otherwise (if x = = y) */ def Compare (x: (String, Int), Y: ( String, int)): int ={x._2.compare (y._2)}}}
3,maven for Packaging
package
4, submit application
com.ibeifeng.bigdata.spark.app.LogAnalyzer analyzer-logs-1.0.jar
com.ibeifeng.bigdata.spark.app.LogAnalyzer --deploy-mode cluster analyzer-logs-1.0.jarspark://spark.com.cn:7077
IV, spark launch on yarn
1) When compiling spark, specify select-pyarn
2) When submitting the application, specify hadoop_conf, read the configuration information
–master yarn
First mode of operation: Yarn-client
/opt/modules/spark-1.3.0-bin-2.5.0com.ibeifeng.bigdata.spark.app.LogAnalyzer analyzer-logs-1.0.jar yarn-client
Second mode of operation: Yarn-cluster
/opt/modules/spark-1.3.0-bin-2.5.0com.ibeifeng.bigdata.spark.app.LogAnalyzer analyzer-logs-1.0.jar
Spark Learning four: website log analysis case