標籤:jsp index access ram class textfile port cti art
日誌資料:
0:0:0:0:0:0:0:1 - - [11/Nov/2016:14:41:31 +0800] "GET /CloudDocLib/portal/deamon/manage.jsp HTTP/1.1" 200 138210:0:0:0:0:0:0:1 - - [11/Nov/2016:14:41:32 +0800] "GET /CloudDocLib/xng/xngAction!listDeamons.action?page=0&count=10&sort=SYMBOL&order=asc&query=STYPE%3AEQA%3BCINDUSTRY.STYLE%3A009%3BCINDUSTRY.STYLECODE%3AZC7&jobListType=1&host=unknown HTTP/1.1" 200 3320:0:0:0:0:0:0:1 - - [11/Nov/2016:14:41:40 +0800] "POST /CloudDocLib/xng/xngAction!startDeamon.action HTTP/1.1" 200 132```**要求:按照時間每個小時統計get產生的次數**
第一種做法是使用sql的做法:
scala代碼:
import org.apache.Spark.sql.SparkSessionimport org.apache.spark.{SparkConf, SparkContext}/*** Created by xiaopengpeng on 2016/12/15.*/class countget {}object countget{def main(args: Array[String]): Unit = {val conf = new SparkConf().setAppName(“countget”).setMaster(“local[*]”)val spark = SparkSession.builder().config(conf).getOrCreate()import spark.implicits._//0:0:0:0:0:0:0:1 - - [11/Nov/2016:14:41:31 +0800] “GET /CloudDocLib/portal/deamon/manage.jsp HTTP/1.1” 200 13821val logDF = spark.sparkContext.textFile(“D:\Program\apache-tomcat-7.0.72\logs\localhost_access_log.2016-11-11.txt”)//.foreach(x=>x.split(” “).map()).map(line =>line.split(” “)).map(list=>( list(3).substring(list(3).lastIndexOf(“/”)+1,list(3).lastIndexOf(“/”)+8),list(5))).toDF(“time”,”method”);logDF.show();logDF.createOrReplaceTempView(“log”);spark.sql(“SELECT time,COUNT(method) FROM log WHERE method=’\”GET’ group by time”).show();}}
第二種做法是用的純粹的scala代碼實現的代碼:
import org.apache.spark.SparkConfimport org.apache.spark.sql.SparkSession/*** Created by root on 2016/12/15.*/class CountGetByScala {}object CountGetByScala{def main(args: Array[String]): Unit = {val conf = new SparkConf().setAppName(“countget”).setMaster(“local[*]”)val spark = SparkSession.builder().config(conf).getOrCreate()import spark.implicits._//0:0:0:0:0:0:0:1 - - [11/Nov/2016:14:41:31 +0800] “GET /CloudDocLib/portal/deamon/manage.jsp HTTP/1.1” 200 13821val logLine = spark.sparkContext.textFile(“D:\Program\apache-tomcat-7.0.72\logs\localhost_access_log.2016-11-11.txt”).map(line =>line.split(” “)).map(list=>( list(3).substring(list(3).lastIndexOf(“/”)+1,list(3).lastIndexOf(“/”)+8),list(5)))val filter = logLine.filter(y=>y._2.equals(“\”GET”))val group = filter.groupBy(line=>line._1)val result = group.map(g =>(g._1,g._2.toList.size))result.foreach(x=>println(x))}}
統計web日誌裡面一個時間段的get請求數量