Log data:
0:0:0:0:0:0:0:1- - [ One/nov/ .: -: A: to+0800]"get/clouddoclib/portal/deamon/manage.jsp http/1.1" $ 138210:0:0:0:0:0:0:1- - [ One/nov/ .: -: A: ++0800]"get/clouddoclib/xng/xngaction!listdeamons.action?page=0&count=10&sort=symbol&order=asc& Query=stype%3aeqa%3bcindustry. Style%3a009%3bcindustry. Stylecode%3azc7&joblisttype=1&host=unknown http/1.1" $ 3320:0:0:0:0:0:0:1- - [ One/nov/ .: -: A: ++0800]"post/clouddoclib/xng/xngaction!startdeamon.action http/1.1" $ the```* * Required: Count the number of get generated per hour by time * *
The first approach is to use SQL:
Scala code:
Import Org.apache.Spark.sql.SparkSessionimport Org.apache.spark. {sparkconf, sparkcontext}/** * Created by Xiaopengpeng on 2016/12/15.*/classCountget {}Objectcountget{def Main (args:array[string]): Unit={val conf=NewSparkconf (). Setappname ("Countget"). Setmaster ("local[*] ") Val Spark=sparksession.builder (). config (conf). Getorcreate () Import spark.implicits._//0:0:0:0:0:0:0:1--[11/nov/2016:14:41:31 +0800] "get/clouddoclib/portal/deamon/manage.jsp HTTP/1.1" 200 13821 Val logdf = Spark.sparkContext.textFile ("D:\Program\apache-tomcat-7.0. the\logs\localhost_access_log. .- One- One. txt ")//. foreach (X=>x.split (""). Map ()). Map (line =>line.split (")). Map (list=> (list) (3). SUBSTRING (List (3). LastIndexOf ("/") +1, List (3). LastIndexOf ("/") +8), List (5)) . TODF ("Time", "method"); Logdf.show (); Logdf.createorreplacetempview ("Log"); Spark.sql ("Select Time,count" ( method) from Log WHERE method=' \ ' GET ' GROUP by Time '). Show ();}}
第二种做法是用的纯粹的scala代码实现的代码:
Import Org.apache.spark.SparkConfimport org.apache.spark.sql.SparkSession/** * Created by Root on 2016/12/15.*/classCountgetbyscala {}Objectcountgetbyscala{def Main (args:array[string]): Unit={val conf=NewSparkconf (). Setappname ("Countget"). Setmaster ("local[*] ") Val Spark=sparksession.builder (). config (conf). Getorcreate () Import spark.implicits._//0:0:0:0:0:0:0:1--[11/nov/2016:14:41:31 +0800] "get/clouddoclib/portal/deamon/manage.jsp HTTP/1.1" 200 13821 Val logline = Spark.sparkContext.textFile ("D:\Program\apache-tomcat-7.0. the\logs\localhost_access_log. .- One- One. txt "). Map ( Line=>line.split ("")). Map (list=> (list) (3). SUBSTRING (List (3). LastIndexOf ("/") +1, List (3). LastIndexOf ("/") +8), List (5)) Val Filter= Logline.filter (y=>y._2.equals ("\" GET ")) Val Group= Filter.groupby (line=>line._1) Val result= Group.map (g =(g._1,g._2.tolist.size)) result.foreach(x=>println (x))}}
Count the number of GET requests within a time period in the Web log