Day83-thoroughly explain the use of Java way to combat spark streaming development __java

Source: Internet
Author: User
Import Java.util.Arrays;
Import org.apache.spark.SparkConf;
Import org.apache.spark.api.java.function.FlatMapFunction;
Import Org.apache.spark.api.java.function.Function2;
Import org.apache.spark.api.java.function.PairFunction;
Import org.apache.spark.streaming.Durations;
Import Org.apache.spark.streaming.api.java.JavaPairDStream;
Import Org.apache.spark.streaming.api.java.JavaReceiverInputDStream;

Import Org.apache.spark.streaming.api.java.JavaStreamingContext; Import Scala.

Tuple2; /** * * @ClassName: Onlinewordcount * @Description: TODO * @author ZGL * @date September 10, 2016 afternoon 11:59:22 * */public C Lass Onlinewordcount {public static void main (string[] args) {/** * First step: Configure Sparkconf * 1. At least 2 threads, thought Spark Streami The NG application runs at least one thread that is used to continuously accept data. And at least one * thread user processes the accepted data (otherwise, no thread user processes data, memory and disk are overwhelmed over time) * 2, for the cluster, each executor generally certainly not To stop a thread, the number of core allocations per executor is more appropriate for a program that handles spark streaming *.
The best number of core is usually odd: 5,7 */sparkconf sc = new sparkconf (). Setmaster (Args[0))				. Setappname ("online count"); /** * Second step: Create Sparkstreamingcontext: * This is a sparkstreaming application of all the functions of the starting point and the core of the program scheduling * Sparkstreamingcontext build based on sparkconf parameters, Can be recovered based on persistent sparkstreamingcontext content * (typical scenario is reboot after driver crash, because spark streaming has continuous 7*24) uninterrupted running characteristics all need to continue after driver reboot
		 State at this time, the state recovery needs to be based on the recorded checkpoint) 2, you can create several Sparkstreamcontext objects in a spark streaming application, and use the next Sparkstreamingcontext You need to close the Sparkstreamcontext object that is running earlier. So we get a big inspiration that the sparkstreaming framework is just an application on spark core, but the sparkstreaming framework wants to run the spark engineer to write the business logic processing code * * * * Javastrea
		Mingcontext JSC = new Javastreamingcontext (SC, durations.seconds (6)); * * Third step: Create spark streaming enter data source input Stream: * 1, data input source can be based on file, HDFS, Flume, Kafka, socket, etc. * 2, where we specify the data from the network sock Et port, Spark streaming the data that is connected to the port and listens to the port * at run time (of course, the port service must first exist) and will continue to generate data for subsequent business needs (of course for the Spark streaming * application run , there are countless processes in which the process is the same); * 3, if often at intervals of 5 seconds without data, the constant start of the empty job will cause a waste of scheduling resources, because there is no data need to be calculated, so * instance of the enterprise-level build environment code in the specific submission job will determine whether there are several According to IfIf not, no longer submit a job; * javareceiverinputdstream<string> lines = Jsc.sockettextstream ("Master", 9999); /** * Fourth Step: * Programming based on Dstream as for RDD programming.
		 Dstream is a rdd-generated template, which is the essence of translating every batch operation into RDD operation before the spark stream is calculated. * For the initial dstream of the transformation level of processing, such as map, filter, such as the program of the exhortation function to carry out specific data calculation: * * * * * * javapairdstream<string, Integer&gt ;  Pairs = Lines.flatmap (new flatmapfunction<string, string> () {/** * Step 4.1: Split the string of each line into a single word/private

			Static final Long serialversionuid = 1L;  @Override public iterable<string> called (String STRs) throws Exception {return arrays.aslist (Strs.split ("
			")); /** * Step 4.2: Count Each word instance to 1 on the basis of a word split, that is, word => (Word, 1)/}. Maptopair (new pairfunction<string, St
						
						Ring, integer> () {@Override public tuple2<string, integer> call (String str) throws Exception {
					return new tuple2<string, integer> (str, 1);
		
		 }
		}); Javapairdstream<sTring, integer> words = Pairs.reducebykey (new Function2<integer, Integer, integer> () {/** * */PR

			Ivate static final Long serialversionuid = 1L;
			@Override public integer Call (integer i1, integer i2) throws Exception {return i1+ i2;
		 
			}
			
		}); * * The print here does not go directly to the execution of the job, because everything is now under the control of the spark streaming framework, and for spark streaming * Specific triggers for the real job run are based on the duration time between settings You must note that the spark streaming application to perform a specific job, the Dtream must have output stream operations, * output stream has many types of function triggers, class print, SA Veastextfile, Saveashadoopfiles, and so on, the most important one * method is Foraeachrdd, because spark streaming processing of the results will generally be placed in Redis, DB, Dashboard and so on,
			 Foreachrdd * is mainly used to complete these functions, and can be arbitrary to customize the specific data where ...
		 * * */words.print ();
		 
		 
		 Jsc.start ();
		 Jsc.awaittermination ();
		
	Jsc.stop ();













 }
}

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.