Spark Learning four: website log analysis case

Source: Internet
Author: User

Spark Learning four: website log analysis case

tags (space delimited): Spark

    • Spark Learning four site log analysis case
      • Create a MAVEN project
      • Two Create a template
      • 3rd Log Analysis Case

One, create a MAVEN project

1. Execute the MAVEN command to create the project

mvn archetype:generate -DarchetypeGroupId=org.scala-tools.archetypes -DarchetypeArtifactId=scala-archetype-simple -DremoteRepositories=http://scala-tools.org/repo-releases -DgroupId=com.ibeifeng.bigdata.spark.app -DartifactId=analyzer-logs -Dversion=1.0

2,idea Importing MAVEN Projects

Second, create a template

#if  ((${package_name} && ${package_name}! = Span class= "hljs-string" "" ")) package  ${package_name} #end import  org.apache.spark.sparkcontextimport  Org.apache.spark.sparkconf#parse ( "File Header.java" ) < Span class= "Hljs-keyword" >object  ${ NAME} {def  Main (args:array[ String] {//create sparkconf  val  sparkconf=  new sparkconf (). Setappname (  "Test" ). Setmaster ( Span class= "hljs-string" > "local[2]" ) //create SC  val  sc= new  sparkcontext (sparkconf) Sc.stop ()}} 
Third, log analysis case

1, prepare the data
2, complete the code

Apacheaccesslog.scala

 PackageCom.ibeifeng.bigdata.spark.app/** * Created by hadoop001 on 4/27/16. * *Case   class apacheaccesslog(ipaddress:string, clientidentd:string, UserId:    String, Datatime:string, method:string, endpoint:string, protocol:string, Responsecode:int, Contentsize:long) {} Object apacheaccesslog{  //Regex  //64.242.88.10--[07/mar/2004:16:05:49-0800]  //"Get/twiki/bin/edit/main/double_bounce_sender?topicparent=main.configurationvariables HTTP/1.1"  //401 12846  ValParttern ="" ^ (\s+) (\s+) (\s+) \[([\w:/]+\s[+|-]\d{4}) \] "( \s+) (\s+) (\s+)" (\d{3}) (\d+) "" ". RdefIsvalidatelogline (log:string): boolean={ValRes=parttern.findfirstmatchin (log)if(Res.isempty) {false}Else{true}  }defParselogline (log:string): apacheaccesslog={ValRes=parttern.findfirstmatchin (log)if(Res.isempty) {Throw NewRuntimeException ("Cannot parse log line:"+ log)}ValM=res.get Apacheaccesslog (M.group (1), M.group (2), M.group (3), M.group (4), M.group (5), M.group (6), M.group (7), M.group (8). ToInt, M.group (9). Tolong)}}

Orderingutils.scala

package  Com.ibeifeng.bigdata.spark.app/** * Created by hadoop001 on 4/28/16. */ object  orderingutils  { object  secondvalueordering  extends  ordering  [ (String, Int) ]      { /** Returns an integer whose sign communicates how x compares to Y.      * The result sign has the following meaning: * *-negative if x < Y *-positive if x > y *-Zero otherwise (if x = = y) */ def  Compare (x: (String, Int), Y: ( String, int)): int ={x._2.compare (y._2)}}}  

3,maven for Packaging

package

4, submit application

com.ibeifeng.bigdata.spark.app.LogAnalyzer analyzer-logs-1.0.jar

com.ibeifeng.bigdata.spark.app.LogAnalyzer --deploy-mode cluster analyzer-logs-1.0.jarspark://spark.com.cn:7077

IV, spark launch on yarn

1) When compiling spark, specify select-pyarn
2) When submitting the application, specify hadoop_conf, read the configuration information
–master yarn

First mode of operation: Yarn-client

/opt/modules/spark-1.3.0-bin-2.5.0com.ibeifeng.bigdata.spark.app.LogAnalyzer analyzer-logs-1.0.jar yarn-client

Second mode of operation: Yarn-cluster

/opt/modules/spark-1.3.0-bin-2.5.0com.ibeifeng.bigdata.spark.app.LogAnalyzer analyzer-logs-1.0.jar

Spark Learning four: website log analysis case

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.