First, the Java Way development
1, pre-development preparation: Assume that you set up the spark cluster.
2, the development environment uses Eclipse MAVEN project, need to add spark streaming dependency.
650) this.width=650; "Src=" http://images2015.cnblogs.com/blog/860767/201604/860767-20160425230238517-586254323. GIF "style=" margin:0px;padding:0px;border:0px; "/>
3. Spark streaming is calculated based on spark core and requires attention:
Set local master, you must configure at least two threads, or set it through sparkconf, if you specify locally, since the spark streaming application is running with at least one thread that is used to continuously iterate over the data. And at least one thread is used to process the received data (otherwise it cannot be used for processing data), and the memory and disk will be overwhelmed over time.
Warm tips:
For a cluster, every exccutor is generally more than one thread, so how much core per executor is appropriate for handling spark streaming applications? Based on our past experience, 5 or so cores are the best (satin: The best performance for an odd number of cores, for example: Assigning 3, 5, 7 cores, etc.)
Next, let's start writing Java code!
First step: Create a Sparkconf object
650) this.width=650; "Src=" http://images2015.cnblogs.com/blog/860767/201604/860767-20160425230333767-26176125. GIF "style=" margin:0px;padding:0px;border:0px; "/>
Step Two: Create Sparkstreamingcontext
650) this.width=650; "Src=" http://images2015.cnblogs.com/blog/860767/201604/860767-20160425230457970-436599010. GIF "style=" margin:0px;padding:0px;border:0px;line-height:1.5; "/>
We create Sparkstreamingcontext objects in a configuration-based manner:
650) this.width=650; "Src=" http://images2015.cnblogs.com/blog/860767/201604/860767-20160425230533830-792810704. GIF "style=" margin:0px;padding:0px;border:0px; "/>
650) this.width=650; "Src=" http://images2015.cnblogs.com/blog/860767/201604/860767-20160425230550783-1606003367. GIF "style=" margin:0px;padding:0px;border:0px; "/>
The third step, Create spark streaming input data Source:
We configure the data source as local port 9999 (note that port requirements are not being used), and if it is a program created under the Windows system, you can use the TCP/UDP to send the socket tool for testing if it is created under a Linux system
Java program, you can directly use the NC-LK 9999 command to enter content for testing
650) this.width=650; "Src=" http://images2015.cnblogs.com/blog/860767/201604/860767-20160425230628220-436618860. GIF "style=" margin:0px;padding:0px;border:0px; "/>
Fourth Step: We're like the Rdd . programming, based on Dstream for programming because of the Dstream It's an rdd . generated template, in spark streaming before the calculation is taken, the essence of each batch the Dstream the operation is translated into an rdd operation.
1, flatmap operation:
650) this.width=650; "Src=" http://images2015.cnblogs.com/blog/860767/201604/860767-20160425230733658-1467484859. GIF "style=" margin:0px;padding:0px;border:0px; "/>
2, Maptopair operation:
650) this.width=650; "Src=" http://images2015.cnblogs.com/blog/860767/201604/860767-20160425230717267-885003527. GIF "style=" margin:0px;padding:0px;border:0px; "/>
3, Reducebykey operation:
650) this.width=650; "Src=" http://images2015.cnblogs.com/blog/860767/201604/860767-20160425230841814-1320014785. GIF "style=" margin:0px;padding:0px;border:0px; "/>
4, print and other operations:
650) this.width=650; "Src=" http://images2015.cnblogs.com/blog/860767/201604/860767-20160425230923861-2017163906. GIF "style=" margin:0px;padding:0px;border:0px; "/>
Warm tips:
In addition to the print () method will be processed after the data output, there are other methods are also very important, in the development need to focus on, such as Saveastextfile,saveashadoopfile, etc., the most important is the Foreachrdd method, This method can write data to Redis,db,dashboard and so on, and can even arbitrarily define where the data is placed, the function is very powerful.
First, the development of Scala mode
The first step is to receive the data source:
650) this.width=650; "Src=" http://images2015.cnblogs.com/blog/860767/201604/860767-20160425231103127-489215048. GIF "style=" margin:0px;padding:0px;border:0px; "/>
The second step, flatmap operation:
650) this.width=650; "Src=" http://images2015.cnblogs.com/blog/860767/201604/860767-20160425231120017-862029195. GIF "style=" margin:0px;padding:0px;border:0px; "/>
The third step, map operation:
650) this.width=650; "Src=" http://images2015.cnblogs.com/blog/860767/201604/860767-20160425231145923-973911690. GIF "style=" margin:0px;padding:0px;border:0px; "/>
Fourth step, reduce operation:
650) this.width=650; "Src=" http://images2015.cnblogs.com/blog/860767/201604/860767-20160425231219861-93522352. GIF "style=" margin:0px;padding:0px;border:0px; "/>
The fifth step, print () and other operations:
650) this.width=650; "Src=" http://images2015.cnblogs.com/blog/860767/201604/860767-20160425231239767-837871976. GIF "style=" margin:0px;padding:0px;border:0px; "/>
Sixth step: awaittermination operation
650) this.width=650; "Src=" http://images2015.cnblogs.com/blog/860767/201604/860767-20160425231256064-1659384662. GIF "style=" margin:0px;padding:0px;border:0px; "/>
Summarize:
With spark streaming you can handle a variety of data source types, such as database, HDFS, server log logs, network streams, which are more powerful than you might imagine, but are often not used by people, and the real reason for this is the spark, spark Streaming itself does not understand.
Note:
Data from: Dt_ Big Data DreamWorks (the fund's legendary action secret course)
For more private content, please follow the public number: Dt_spark
If you are interested in big data spark, you can listen to it free of charge by Liaoliang teacher every night at 20:00 Spark Permanent free public class, address yy room Number: 68917580
Life was short,you need to spark!
This article is from "Dt_spark Big Data DreamWorks" blog, please make sure to keep this source http://18610086859.blog.51cto.com/11484530/1768477
83rd lesson: Scala and Java two ways to combat spark streaming development