統計web日誌裡面一個時間段的get請求數量

來源:互聯網
上載者:User

標籤:jsp   index   access   ram   class   textfile   port   cti   art   

日誌資料:

0:0:0:0:0:0:0:1 - - [11/Nov/2016:14:41:31 +0800] "GET /CloudDocLib/portal/deamon/manage.jsp HTTP/1.1" 200 138210:0:0:0:0:0:0:1 - - [11/Nov/2016:14:41:32 +0800] "GET /CloudDocLib/xng/xngAction!listDeamons.action?page=0&count=10&sort=SYMBOL&order=asc&query=STYPE%3AEQA%3BCINDUSTRY.STYLE%3A009%3BCINDUSTRY.STYLECODE%3AZC7&jobListType=1&host=unknown HTTP/1.1" 200 3320:0:0:0:0:0:0:1 - - [11/Nov/2016:14:41:40 +0800] "POST /CloudDocLib/xng/xngAction!startDeamon.action HTTP/1.1" 200 132```**要求:按照時間每個小時統計get產生的次數**
第一種做法是使用sql的做法:
scala代碼:
import org.apache.Spark.sql.SparkSessionimport org.apache.spark.{SparkConf, SparkContext}/*** Created by xiaopengpeng on 2016/12/15.*/class countget {}object countget{def main(args: Array[String]): Unit = {val conf = new SparkConf().setAppName(“countget”).setMaster(“local[*]”)val spark = SparkSession.builder().config(conf).getOrCreate()import spark.implicits._//0:0:0:0:0:0:0:1 - - [11/Nov/2016:14:41:31 +0800] “GET /CloudDocLib/portal/deamon/manage.jsp HTTP/1.1” 200 13821val logDF = spark.sparkContext.textFile(“D:\Program\apache-tomcat-7.0.72\logs\localhost_access_log.2016-11-11.txt”)//.foreach(x=>x.split(” “).map()).map(line =>line.split(” “)).map(list=>( list(3).substring(list(3).lastIndexOf(“/”)+1,list(3).lastIndexOf(“/”)+8),list(5))).toDF(“time”,”method”);logDF.show();logDF.createOrReplaceTempView(“log”);spark.sql(“SELECT time,COUNT(method) FROM log WHERE method=’\”GET’ group by time”).show();}}
第二種做法是用的純粹的scala代碼實現的代碼:
import org.apache.spark.SparkConfimport org.apache.spark.sql.SparkSession/*** Created by root on 2016/12/15.*/class CountGetByScala {}object CountGetByScala{def main(args: Array[String]): Unit = {val conf = new SparkConf().setAppName(“countget”).setMaster(“local[*]”)val spark = SparkSession.builder().config(conf).getOrCreate()import spark.implicits._//0:0:0:0:0:0:0:1 - - [11/Nov/2016:14:41:31 +0800] “GET /CloudDocLib/portal/deamon/manage.jsp HTTP/1.1” 200 13821val logLine = spark.sparkContext.textFile(“D:\Program\apache-tomcat-7.0.72\logs\localhost_access_log.2016-11-11.txt”).map(line =>line.split(” “)).map(list=>( list(3).substring(list(3).lastIndexOf(“/”)+1,list(3).lastIndexOf(“/”)+8),list(5)))val filter = logLine.filter(y=>y._2.equals(“\”GET”))val group = filter.groupBy(line=>line._1)val result = group.map(g =>(g._1,g._2.toList.size))result.foreach(x=>println(x))}}

 

 

統計web日誌裡面一個時間段的get請求數量

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.