Spark cultivation (advanced)-Spark beginners: Section 13th Spark Streaming-Spark SQL, DataFrame, and Spark Streaming

Source: Internet
Author: User

Spark cultivation (advanced)-Spark beginners: Section 13th Spark Streaming-Spark SQL, DataFrame, and Spark Streaming
Main Content: Spark SQL, DataFrame and Spark Streaming1. Spark SQL, DataFrame and Spark Streaming

Source code direct reference: https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/SqlNetworkWordCount.scala

Import org. apache. spark. sparkConfimport org. apache. spark. sparkContextimport org. apache. spark. rdd. RDDimport org. apache. spark. streaming. {Time, Seconds, StreamingContext} import org. apache. spark. util. intParamimport org. apache. spark. SQL. SQLContextimport org. apache. spark. storage. storageLevelobject SqlNetworkWordCount {def main (args: Array [String]) {if (args. length <2) {System. err. println ("Usage: NetworkWordCount
   
   
    
") System. exit (1)} StreamingExamples. setStreamingLogLevels () // Create the context with a 2 second batch size val sparkConf = new SparkConf (). setAppName ("SqlNetworkWordCount "). setMaster ("local [4]") val ssc = new StreamingContext (sparkConf, Seconds (2) // Create a socket stream on target ip: port and count the // words in input stream of \ n delimited text (eg. generated by 'nc ') // Note that no duplication in storage level only for running locally. // Replication necessary in distributed scenario for fault tolerance. // use Socke as the data source val lines = ssc. socketTextStream (args (0), args (1 ). toInt, StorageLevel. MEMORY_AND_DISK_SER) // words DStream val words = lines. flatMap (_. split ("") // Convert RDDs of the words DStream to DataFrame and run SQL query // call the foreachRDD method to traverse RDD words in DStream. foreachRDD (rdd: RDD [String], time: Time) =>{// Get the singleton instance of SQLContext val sqlContext = SQLContextSingleton. getInstance (rdd. sparkContext) import sqlContext. implicits. _ // Convert RDD [String] to RDD [case class] to DataFrame val wordsDataFrame = rdd. map (w => Record (w )). toDF () // Register as table wordsDataFrame. registerTempTable ("words") // Do word count on table using SQL and print it val wordCountsDataFrame = sqlContext. SQL ("select word, count (*) as total from words group by word ") println (s "===========$ time ===========") wordCountsDataFrame. show ()}) ssc. start () ssc. awaitTermination () }}/ ** Case class for converting RDD to DataFrame */case class Record (word: String) /** Lazily instantiated singleton instance of SQLContext */object SQLContextSingleton {@ transient private var instance: SQLContext = _ def getInstance (sparkContext: SparkContext ): SQLContext = {if (instance = null) {instance = new SQLContext (sparkContext)} instance }}
   
  

After running the program, run the following command:

root@sparkmaster:~# nc -lk 9999Spark is a fast and general cluster computing system for Big DataSpark is a fast and general cluster computing system for Big DataSpark is a fast and general cluster computing system for Big DataSpark is a fast and general cluster computing system for Big DataSpark is a fast and general cluster computing system for Big DataSpark is a fast and general cluster computing system for Big DataSpark is a fast and general cluster computing system for Big Data

Processing result:

========= 1448783840000 ms =========+---------+-----+| word|total|+---------+-----+| Spark| 12|| system| 12|| general| 12|| fast| 12|| and| 12||computing| 12|| a| 12|| is| 12|| for| 12|| Big| 12|| cluster| 12|| Data| 12|+---------+-----+========= 1448783842000 ms =========+----+-----+|word|total|+----+-----++----+-----+========= 1448783844000 ms =========+----+-----+|word|total|+----+-----++----+-----+

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.