Scenario: Use spark streaming to receive the data sent by Kafka and related query operations to the tables in the relational database;
The data format sent by Kafka is: ID, name, Cityid, and the delimiter is tab.
1 Zhangsan 12 Lisi 13 Wangwu 24 3
The table city structure of MySQL is: ID int, name varchar
1 BJ2 sz3 sh
The results of this case are: Select S.id, S.name, S.cityid, c.name from student S joins City C on S.cityid=c.id;
Kafka installation See also: Kafka stand-alone version environment construction
Start Kafka:
Zkserver. SH Startkafka-server-start. SH $KAFKA _home/config/server.properties &KAFKA-topics. SH --create--zookeeper hadoop000:218111 --topic Luogankun_topickafka -console-producer. sh --broker-list hadoop000:9092 --topic luogankun_topic
Instance code:
Package COM.ASIAINFO.OCDC Case class Student (Id:int, name:string, Cityid:int)
PackageCOM.ASIAINFO.OCDCImportorg.apache.spark.streaming._ImportOrg.apache.spark. {sparkcontext, sparkconf}ImportOrg.apache.spark.sql.hive.HiveContextImportOrg.apache.spark.storage.StorageLevelImportorg.apache.spark.streaming.kafka._/*** Spark streaming processes Kafka data and processes it in conjunction with the Spark JDBC External data source * *@authorLuogankun*/Object Kafkastreaming {def main (args:array[string]) {if(Args.length < 4) {System.err.println ("Usage:kafkastreaming <zkQuorum> <group> <topics> <numThreads>") System.exit (1)} Val Array (Zkquorum, group, topics, numthreads)=args Val sparkconf=Newsparkconf () Val SC=NewSparkcontext (sparkconf) Val SSC=NewStreamingContext (SC, Seconds (5)) Val SqlContext=NewHivecontext (SC)Importsqlcontext._Importcom.luogankun.spark.jdbc._//using external data sources to work with MySQLVal cities = sqlcontext.jdbctable ("Jdbc:mysql://hadoop000:3306/test", "root", "root", "SELECT ID, name from city") //registering the cities Rdd as a city temp tableCities.registertemptable ("City") Val Topicpmap= Topics.split (","). Map ((_, Numthreads.toint)). Tomap Val Inputs=Kafkautils.createstream (SSC, Zkquorum, Group, Topicpmap, Storagelevel.memory_and_disk_ser). Map (_._2) Inputs.foreachrdd (Rdd= { if(Rdd.partitions.length > 0) { //register the data received in streaming as a student temporary tableRdd.map (_.split ("\ T")). Map (x = Student (x (0). ToInt, X (1), X (2). ToInt). Registertemptable ("Student") //correlate streaming and MySQL tables for query operationsSqlcontext.sql ("Select S.id, S.name, S.cityid, c.name from student S joins City C on S.cityid=c.id"). Collect (). foreach (println)}) Ssc.start () Ssc.awaittermination ()}}
Commit to cluster execution script: sparkstreaming_kafka_jdbc.sh
#!/bin/sh/etc/-xcd $SPARK _home/binspark- ---- --master Spark://hadoop000:7077 \--executor-1/home/spark /software/source/streaming-app/target/streaming-app-v00b01c00-snapshot-jar-with-Dependencies.jar hadoop000 :21811
Spark streaming, Kafka combined with spark JDBC External datasouces processing case