The code is as follows:
Package com.dt.spark.streamingimport org.apache.spark.sql.sqlcontextimport org.apache.spark. {sparkcontext, sparkconf}import org.apache.spark.streaming. {streamingcontext, duration}/** * logs are analyzed using sparkstreaming combined with sparksql. * assuming e-commerce website click Log Format (Simplified) The following: * userid,itemid,clicktime * requirements: processing the item click order within 10 minutes Top10, and display the name of the product. The correspondence between commodity itemid and commodity name is stored in the MySQL database * created by dinglq on 2016/5/4. */object loganalyzerstreamingsql { val window_length = new duration (600 * 1000) val slide_interval = new duration (10 * 1000) Def main (args: array[string]) { val sparkConf = new Sparkconf (). Setappname ("Loganalyzerstreamingsql"). Setmaster ("local[4]")     VAL SC = new sparkcontext (sparkconf) val sQlcontext = new sqlcontext (SC) import sqlContext.implicits._ //loading ItemInfo tables from the database val itemInfoDF = SqlContext.read.format ("JDBC"). Options (Map ( url "-> " jdbc:mysql:// Spark-master:3306/spark ", " Driver "Com.mysql.jdbc.Driver", "DBTable", "ItemInfo", "user", "root", "Password"-> "Vincent" )). Load () iteminfodf.registertemptable ("ItemInfo") val Streamingcontext = new streamingcontext (Sc, slide_interval) val loglinesdstream = streamingcontext.textfilestream ("d:/logs_incoming") Val accesslogsdstream = loglinesdstream. Map (Accesslog.parselogline). Cache () val windowDStream = Accesslogsdstream.window (window_length, slide_interval) windowdstream.foreachrdd ( accesslogs => { if (Accesslogs.isempty ()) { println ("No logs received in this time interval ") } else {   ACCESSLOGS.TODF (). Registertemptable ("Accesslogs") val sqlstr = "Select a.itemid,a.itemname,b.cnt from iteminfo a join " + " (Select itemid,count (*) cnt from accesslogs group by itemid) b " + " on (A.itemid=b.itemid) ORDER BY cnt DESC LIMIT 10 " val toptenclickitemlast10minus = sqlcontext.sql (SQLSTR) // Persist top ten table for this window to hdfs as parquet file toptenclickitemlast10minus.show () } }) streamingcontext.start () streamingcontext.awaittermination () }}case class accesslog (userid: string, itemid: string, clicktime: String) {}object accesslog { def parselogline (log: string): AccessLog = { val loginfo = log.split (",") if (loginfo.length == 3) { accesslog (loginfo (0), Loginfo (1), loginfo (2)) } else { accesslog ("0", "0", "0") } }}
The contents of the table in MySQL are as follows:
Mysql> SELECT * from spark.iteminfo;+--------+----------+| Itemid | ItemName |+--------+----------+| 001 | Phone | | 002 | Computer | | 003 | TV |+--------+----------+3 rows in Set (0.00 sec)
Create a directory in D logs_incoming
Run the spark streaming program.
Create a new file with the following contents:
0001,001,2016-05-04 22:10:200002,001,2016-05-04 22:10:210003,001,2016-05-04 22:10:220004,002,2016-05-04 22:10:230005,002,2016-05-04 22:10:240006,001,2016-05-04 22:10:250007,002,2016-05-04 22:10:260008,001,2016-05-04 22:10:270009,003,2016-05-04 22:10:280010,003,2016-05-04 22:10:290011,001,2016-05-04 22:10:300012,003,2016-05-04 22:10:310013,003,2016-05-04 22:10:32
Save the file in the directory logs_incoming and observe the output of the Spark program:
+------+--------+---+|itemid|itemname|cnt|+------+--------+---+| 001| phone| 6| | 003| tv| 4| | 002|computer| 3|+------+--------+---+
Note:
1. DT Big Data Dream Factory public number Dt_spark
2, the IMF 8 o'clock in the evening big data real combat YY Live channel number: 68917580
3, Sina Weibo: Http://www.weibo.com/ilovepains
This article is from the "Ding Dong" blog, please be sure to keep this source http://lqding.blog.51cto.com/9123978/1770198
97th lesson: Spark streaming combined with spark SQL case