Spark Learning four: website log analysis case

Last Update:2016-05-01 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

tags (space delimited): Spark

Spark Learning four site log analysis case
- Create a MAVEN project
- Two Create a template
- 3rd Log Analysis Case

One, create a MAVEN project

1. Execute the MAVEN command to create the project

mvn archetype:generate -DarchetypeGroupId=org.scala-tools.archetypes -DarchetypeArtifactId=scala-archetype-simple -DremoteRepositories=http://scala-tools.org/repo-releases -DgroupId=com.ibeifeng.bigdata.spark.app -DartifactId=analyzer-logs -Dversion=1.0

2,idea Importing MAVEN Projects

Second, create a template

#if  ((${package_name} && ${package_name}! = Span class= "hljs-string" "" ")) package  ${package_name} #end import  org.apache.spark.sparkcontextimport  Org.apache.spark.sparkconf#parse ( "File Header.java" ) < Span class= "Hljs-keyword" >object  ${ NAME} {def  Main (args:array[ String] {//create sparkconf  val  sparkconf=  new sparkconf (). Setappname (  "Test" ). Setmaster ( Span class= "hljs-string" > "local[2]" ) //create SC  val  sc= new  sparkcontext (sparkconf) Sc.stop ()}}

Third, log analysis case

1, prepare the data
2, complete the code

Apacheaccesslog.scala

 PackageCom.ibeifeng.bigdata.spark.app/** * Created by hadoop001 on 4/27/16. * *Case   class apacheaccesslog(ipaddress:string, clientidentd:string, UserId:    String, Datatime:string, method:string, endpoint:string, protocol:string, Responsecode:int, Contentsize:long) {} Object apacheaccesslog{  //Regex  //64.242.88.10--[07/mar/2004:16:05:49-0800]  //"Get/twiki/bin/edit/main/double_bounce_sender?topicparent=main.configurationvariables HTTP/1.1"  //401 12846  ValParttern ="" ^ (\s+) (\s+) (\s+) \[([\w:/]+\s[+|-]\d{4}) \] "( \s+) (\s+) (\s+)" (\d{3}) (\d+) "" ". RdefIsvalidatelogline (log:string): boolean={ValRes=parttern.findfirstmatchin (log)if(Res.isempty) {false}Else{true}  }defParselogline (log:string): apacheaccesslog={ValRes=parttern.findfirstmatchin (log)if(Res.isempty) {Throw NewRuntimeException ("Cannot parse log line:"+ log)}ValM=res.get Apacheaccesslog (M.group (1), M.group (2), M.group (3), M.group (4), M.group (5), M.group (6), M.group (7), M.group (8). ToInt, M.group (9). Tolong)}}

Orderingutils.scala

package  Com.ibeifeng.bigdata.spark.app/** * Created by hadoop001 on 4/28/16. */ object  orderingutils  { object  secondvalueordering  extends  ordering  [ (String, Int) ]      { /** Returns an integer whose sign communicates how x compares to Y.      * The result sign has the following meaning: * *-negative if x < Y *-positive if x > y *-Zero otherwise (if x = = y) */ def  Compare (x: (String, Int), Y: ( String, int)): int ={x._2.compare (y._2)}}}

3,maven for Packaging

package

4, submit application

com.ibeifeng.bigdata.spark.app.LogAnalyzer analyzer-logs-1.0.jar

com.ibeifeng.bigdata.spark.app.LogAnalyzer --deploy-mode cluster analyzer-logs-1.0.jarspark://spark.com.cn:7077

IV, spark launch on yarn

1) When compiling spark, specify select-pyarn
2) When submitting the application, specify hadoop_conf, read the configuration information
–master yarn

First mode of operation: Yarn-client

/opt/modules/spark-1.3.0-bin-2.5.0com.ibeifeng.bigdata.spark.app.LogAnalyzer analyzer-logs-1.0.jar yarn-client

Second mode of operation: Yarn-cluster

/opt/modules/spark-1.3.0-bin-2.5.0com.ibeifeng.bigdata.spark.app.LogAnalyzer analyzer-logs-1.0.jar

Spark Learning four: website log analysis case

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Spark Learning four: website log analysis case

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Spark Learning four: website log analysis case

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support