Day83-thoroughly explain the use of Java way to combat spark streaming development _

Day83-thoroughly explain the use of Java way to combat spark streaming development __java

Last Update:2018-07-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Import Java.util.Arrays;
Import org.apache.spark.SparkConf;
Import org.apache.spark.api.java.function.FlatMapFunction;
Import Org.apache.spark.api.java.function.Function2;
Import org.apache.spark.api.java.function.PairFunction;
Import org.apache.spark.streaming.Durations;
Import Org.apache.spark.streaming.api.java.JavaPairDStream;
Import Org.apache.spark.streaming.api.java.JavaReceiverInputDStream;

Import Org.apache.spark.streaming.api.java.JavaStreamingContext; Import Scala.

Tuple2; /** * * @ClassName: Onlinewordcount * @Description: TODO * @author ZGL * @date September 10, 2016 afternoon 11:59:22 * */public C Lass Onlinewordcount {public static void main (string[] args) {/** * First step: Configure Sparkconf * 1. At least 2 threads, thought Spark Streami The NG application runs at least one thread that is used to continuously accept data. And at least one * thread user processes the accepted data (otherwise, no thread user processes data, memory and disk are overwhelmed over time) * 2, for the cluster, each executor generally certainly not To stop a thread, the number of core allocations per executor is more appropriate for a program that handles spark streaming *.
The best number of core is usually odd: 5,7 */sparkconf sc = new sparkconf (). Setmaster (Args[0))				. Setappname ("online count"); /** * Second step: Create Sparkstreamingcontext: * This is a sparkstreaming application of all the functions of the starting point and the core of the program scheduling * Sparkstreamingcontext build based on sparkconf parameters, Can be recovered based on persistent sparkstreamingcontext content * (typical scenario is reboot after driver crash, because spark streaming has continuous 7*24) uninterrupted running characteristics all need to continue after driver reboot
		 State at this time, the state recovery needs to be based on the recorded checkpoint) 2, you can create several Sparkstreamcontext objects in a spark streaming application, and use the next Sparkstreamingcontext You need to close the Sparkstreamcontext object that is running earlier. So we get a big inspiration that the sparkstreaming framework is just an application on spark core, but the sparkstreaming framework wants to run the spark engineer to write the business logic processing code * * * * Javastrea
		Mingcontext JSC = new Javastreamingcontext (SC, durations.seconds (6)); * * Third step: Create spark streaming enter data source input Stream: * 1, data input source can be based on file, HDFS, Flume, Kafka, socket, etc. * 2, where we specify the data from the network sock Et port, Spark streaming the data that is connected to the port and listens to the port * at run time (of course, the port service must first exist) and will continue to generate data for subsequent business needs (of course for the Spark streaming * application run , there are countless processes in which the process is the same); * 3, if often at intervals of 5 seconds without data, the constant start of the empty job will cause a waste of scheduling resources, because there is no data need to be calculated, so * instance of the enterprise-level build environment code in the specific submission job will determine whether there are several According to IfIf not, no longer submit a job; * javareceiverinputdstream<string> lines = Jsc.sockettextstream ("Master", 9999); /** * Fourth Step: * Programming based on Dstream as for RDD programming.
		 Dstream is a rdd-generated template, which is the essence of translating every batch operation into RDD operation before the spark stream is calculated. * For the initial dstream of the transformation level of processing, such as map, filter, such as the program of the exhortation function to carry out specific data calculation: * * * * * * javapairdstream<string, Integer&gt ;  Pairs = Lines.flatmap (new flatmapfunction<string, string> () {/** * Step 4.1: Split the string of each line into a single word/private

			Static final Long serialversionuid = 1L;  @Override public iterable<string> called (String STRs) throws Exception {return arrays.aslist (Strs.split ("
			")); /** * Step 4.2: Count Each word instance to 1 on the basis of a word split, that is, word => (Word, 1)/}. Maptopair (new pairfunction<string, St
						
						Ring, integer> () {@Override public tuple2<string, integer> call (String str) throws Exception {
					return new tuple2<string, integer> (str, 1);
		
		 }
		}); Javapairdstream<sTring, integer> words = Pairs.reducebykey (new Function2<integer, Integer, integer> () {/** * */PR

			Ivate static final Long serialversionuid = 1L;
			@Override public integer Call (integer i1, integer i2) throws Exception {return i1+ i2;
		 
			}
			
		}); * * The print here does not go directly to the execution of the job, because everything is now under the control of the spark streaming framework, and for spark streaming * Specific triggers for the real job run are based on the duration time between settings You must note that the spark streaming application to perform a specific job, the Dtream must have output stream operations, * output stream has many types of function triggers, class print, SA Veastextfile, Saveashadoopfiles, and so on, the most important one * method is Foraeachrdd, because spark streaming processing of the results will generally be placed in Redis, DB, Dashboard and so on,
			 Foreachrdd * is mainly used to complete these functions, and can be arbitrary to customize the specific data where ...
		 * * */words.print ();
		 
		 
		 Jsc.start ();
		 Jsc.awaittermination ();
		
	Jsc.stop ();













 }
}

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Day83-thoroughly explain the use of Java way to combat spark streaming development __java

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Day83-thoroughly explain the use of Java way to combat spark streaming development __java

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support