Spark cultivation (advanced)-Spark beginners: Section 13th Spark Streaming-Spark SQL, DataFrame, and Spark Streaming

Last Update:2015-12-01 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Spark cultivation (advanced)-Spark beginners: Section 13th Spark Streaming-Spark SQL, DataFrame, and Spark Streaming
Main Content: Spark SQL, DataFrame and Spark Streaming1. Spark SQL, DataFrame and Spark Streaming

Source code direct reference: https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/SqlNetworkWordCount.scala

Import org. apache. spark. sparkConfimport org. apache. spark. sparkContextimport org. apache. spark. rdd. RDDimport org. apache. spark. streaming. {Time, Seconds, StreamingContext} import org. apache. spark. util. intParamimport org. apache. spark. SQL. SQLContextimport org. apache. spark. storage. storageLevelobject SqlNetworkWordCount {def main (args: Array [String]) {if (args. length <2) {System. err. println ("Usage: NetworkWordCount
   
   
    
") System. exit (1)} StreamingExamples. setStreamingLogLevels () // Create the context with a 2 second batch size val sparkConf = new SparkConf (). setAppName ("SqlNetworkWordCount "). setMaster ("local [4]") val ssc = new StreamingContext (sparkConf, Seconds (2) // Create a socket stream on target ip: port and count the // words in input stream of \ n delimited text (eg. generated by 'nc ') // Note that no duplication in storage level only for running locally. // Replication necessary in distributed scenario for fault tolerance. // use Socke as the data source val lines = ssc. socketTextStream (args (0), args (1 ). toInt, StorageLevel. MEMORY_AND_DISK_SER) // words DStream val words = lines. flatMap (_. split ("") // Convert RDDs of the words DStream to DataFrame and run SQL query // call the foreachRDD method to traverse RDD words in DStream. foreachRDD (rdd: RDD [String], time: Time) =>{// Get the singleton instance of SQLContext val sqlContext = SQLContextSingleton. getInstance (rdd. sparkContext) import sqlContext. implicits. _ // Convert RDD [String] to RDD [case class] to DataFrame val wordsDataFrame = rdd. map (w => Record (w )). toDF () // Register as table wordsDataFrame. registerTempTable ("words") // Do word count on table using SQL and print it val wordCountsDataFrame = sqlContext. SQL ("select word, count (*) as total from words group by word ") println (s "===========$ time ===========") wordCountsDataFrame. show ()}) ssc. start () ssc. awaitTermination () }}/ ** Case class for converting RDD to DataFrame */case class Record (word: String) /** Lazily instantiated singleton instance of SQLContext */object SQLContextSingleton {@ transient private var instance: SQLContext = _ def getInstance (sparkContext: SparkContext ): SQLContext = {if (instance = null) {instance = new SQLContext (sparkContext)} instance }}

After running the program, run the following command:

root@sparkmaster:~# nc -lk 9999Spark is a fast and general cluster computing system for Big DataSpark is a fast and general cluster computing system for Big DataSpark is a fast and general cluster computing system for Big DataSpark is a fast and general cluster computing system for Big DataSpark is a fast and general cluster computing system for Big DataSpark is a fast and general cluster computing system for Big DataSpark is a fast and general cluster computing system for Big Data

Processing result:

========= 1448783840000 ms =========+---------+-----+| word|total|+---------+-----+| Spark| 12|| system| 12|| general| 12|| fast| 12|| and| 12||computing| 12|| a| 12|| is| 12|| for| 12|| Big| 12|| cluster| 12|| Data| 12|+---------+-----+========= 1448783842000 ms =========+----+-----+|word|total|+----+-----++----+-----+========= 1448783844000 ms =========+----+-----+|word|total|+----+-----++----+-----+

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Spark cultivation (advanced)-Spark beginners: Section 13th Spark Streaming-Spark SQL, DataFrame, and Spark Streaming

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Spark cultivation (advanced)-Spark beginners: Section 13th Spark Streaming-Spark SQL, DataFrame, and Spark Streaming

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support