Spark Source Customization Lesson One: A thorough understanding of sparkstreaming through cases kick

Source: Internet
Author: User
Tags what magic

Lesson One: A thorough understanding of sparkstreaming through cases kick: Decryption sparkstreaming alternative Experiment and sparkstreaming essence analysis

This issue guide:

    • 1 Spark Source customization choose from sparkstreaming;
    • 2 Spark streaming alternative online experiment;
    • 3 instantly understand the essence of sparkstreaming.

1. Start Spark source version customization from Spark streaming 1.1 reasons for starting the Spark source version customization path from spark streaming

From today onwards, we will embark on a new spark learning journey. Searching for dragons, starting with spark streaming, spark streaming is the dragon vein of Big Data. Why do we have to choose from sparkstreaming to start our Spark source version customization path? There are several reasons for this:

1) Spark big background

At first, Spark did not have the relevant sub-framework content of spark streaming, GraphX, machine learning, Spark SQL, and spark R that we saw today, and initially there was only the original spark Core. We're going to do spark source customization, make our own release, spark streaming as a starting point, spark streaming is a sub-framework on spark core itself, so we're looking through a sub-framework, Must be proficient in the source of spark power and the solution to all problems;

2) Why not choose Sparksql?

We know that spark has a lot of sub-frameworks, and now, in addition to the spark core programming, the most used is sparksql. Spark SQL because it involves too much parsing or optimization of SQL syntax details, in fact these parsing or optimization, it is an important thing for us to focus on spark, but it is not the most important thing. Because it has too many SQL parsing, this is not a suitable sub-framework to let us study.

3) Why not choose Spark R?

Spark R is now very immature and has limited support, which is also removed from our candidate list.

4) Why not choose Sparkgraphx (figure calculation)?

If you are concerned about the evolution or development of spark, there are few recent releases of Spark, and the spark graph calculation is largely without improvement. If you follow this trend, the official spark agency seems to be revealing a signal that the figure has reached its end. So, if we want to study, we certainly won't do something that looks like it's going to end. In addition, as far as the graph calculation is concerned, it has a lot of math-level algorithms, and we're going to do the best with spark, so the math thing is important, but not the most important for us.

5) Why not choose Sparkmllib (machine learning)?

Spark machine Learning builds on its many libraries by encapsulating vector (vector) and metrics, plus the rdd of Spark. This is also because it involves too much knowledge of mathematics, so we choose machine learning is actually not a very good choice.

1.2 Spark Streaming Magic

In the first half of 2016, according to a survey conducted by StackOverflow, more than 50% per cent believed that spark streaming was the most attractive spark. In short, everyone is thinking about using spark, mainly because of spark streaming. What Magic does Spark streaming have?

1) It is flow-based computing

This is an era of stream processing, where all data is invalid if it is not streaming or is not related to streaming processing. This sentence will continue to be confirmed by the development of society.

2) Streaming is really our initial impression of big data.

On the one hand, the data flows in, immediately give us a feedback, this is not batch processing or data mining can do. On the other hand, spark is very powerful because its streaming can take advantage of the results of machine learning, graph computing, sparksql or spark R online, thanks to Spark's diverse, integrated infrastructure design. That is, in the spark technology stack, spark streaming can invoke any API interface and does not need to make any settings. This is no match for spark, and it is the source of the spark streaming that will be eminence. This era of stream processing alone has gone, sparkstreaming will inevitably with a number of spark sub-framework to dominate the big data field.

3) streaming "charm and complexity" of the dual body

If you're proficient in spark streaming, you know the spark streaming and the Brotherhood framework behind it, showing the magic of spark and big data. However, in all spark programs, it is certain that spark streaming-based applications are the most prone to problems. Why? Because the data constantly flow in, it to dynamically control the flow of data, job segmentation and data processing. All of this brings great complexity.

4) Huge difference from other spark sub-frames

If you look closely, you will find that Sparkstreaming is an application based on Spark core. Unlike other sub-frameworks, such as machine learning, which applies mathematical algorithms directly above the rdd of Spark, sparkstreaming more like a general application, perceiving the data flowing in and processing it accordingly. So if you want to do spark custom development, spark streaming provides the best reference, and mastering the spark streaming is easy to develop any other program. Of course it's impossible to master sparkstreaming, but not to master spark core. Spark Core plus spark streaming is a double-edged sword with endless power.

We choose Sparkstreaming to start, is equal to find the key point. If you want to find the dragon, then spark streaming is where the Dragon Cave. If we find the acupoints, we can rapid. In summary, we screened, Spark streaming is our only option!

Spark Source Customization Lesson One: A thorough understanding of sparkstreaming through cases kick

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.