Lesson 83: Scala and Java two ways to combat spark streaming development

Source: Internet
Author: User

First, the Java Way development

1, pre-development preparation: Assume that you set up the spark cluster.

2, the development environment uses Eclipse MAVEN project, need to add spark streaming dependency.

3. Spark streaming is calculated based on spark core and requires attention:

Set local master, you must configure at least two threads, or set it through sparkconf, if you specify locally, since the spark streaming application is running with at least one thread that is used to continuously iterate over the data. And at least one thread is used to process the received data (otherwise it cannot be used for processing data), and the memory and disk will be overwhelmed over time.

Warm tips:

For a cluster, every exccutor is generally more than one thread, so how much core per executor is appropriate for handling spark streaming applications? Based on our past experience, 5 or so cores are the best (satin: The best performance for an odd number of cores, for example: Assigning 3, 5, 7 cores, etc.)

Next, let's start writing Java code!

First step: Create a Sparkconf object

Step Two: Create Sparkstreamingcontext

We create Sparkstreamingcontext objects in a configuration-based manner:

The third step, Create spark streaming input data Source:

We configure the data source as local port 9999 (note that port requirements are not being used):

Fourth Step: We're like the Rdd . programming, based on Dstream for programming because of the Dstream It's an rdd . generated template, in spark streaming before the calculation is taken, the essence of each batch the Dstream the operation is translated into an rdd operation.

1, flatmap operation:

2, Maptopair operation:

3, Reducebykey operation:

4, print and other operations:

Warm tips:

In addition to the print () method will be processed after the data output, there are other methods are also very important, in the development need to focus on, such as Saveastextfile,saveashadoopfile, etc., the most important is the Foreachrdd method, This method can write data to Redis,db,dashboard and so on, and can even arbitrarily define where the data is placed, the function is very powerful.

First, the development of Scala mode

The first step is to receive the data source:

The second step, flatmap operation:

The third step, map operation:

Fourth step, reduce operation:

The fifth step, print () and other operations:

Sixth step: awaittermination operation

Summarize:

With spark streaming you can handle a variety of data source types, such as database, HDFS, server log logs, network streams, which are more powerful than you might imagine, but are often not used by people, and the real reason for this is the spark, spark Streaming itself does not understand.

Written by: Imf-spark Steaming Enterprise Development Combat Team (Li Lingui, Jiang Wei, etc.)

Main editor: Liaoliang

Note:

Data from: Dt_ Big Data DreamWorks (the fund's legendary action secret course)

For more private content, please follow the public number: Dt_spark

If you are interested in big data spark, you can listen to it free of charge by Liaoliang teacher every night at 20:00 Spark Permanent free public class, address yy room Number: 68917580
Life was short,you need to spark!

Lesson 83: Scala and Java two ways to combat spark streaming development

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.