Spark
Streaming Performance analysis
Given the unique design of Spark Streaming, how fast does it run? In fact, Spark Streaming's ability is reflected in the batch processing of data and the use of the Spark engine to generate comparable or higher throughput than other streaming systems. In terms of latency, Spark Streaming can achieve latency as low as a few hundred milliseconds. Developers sometimes ask if there is more delay in micro-batch processing. In practice, batch delay is only a small part of the end-to-end pipeline delay. Whether it is a Spark system or a continuous computing subsystem, the calculation results of many applications are calculated based on the data stream obtained in a sliding window, and the update of this window is also timed (for example, the window interval is set to 20 seconds, sliding The interval is set to 2 seconds, which means that the information of the 20 seconds before the window is updated every 2 seconds). You need a pipeline to collect records from multiple sources and wait for a short time to process delayed or out-of-order data. Finally, automatic triggering algorithms often wait for a while to trigger. Therefore, compared to end-to-end delays, batch delays rarely add a lot of expense because batch delays are often small. In addition, the DStream throughput gain generally means that we can use less machines to handle the same workload, which is the performance improvement.
The future direction of Spark
Streaming Spark Streaming is one of the most commonly used components in Spark, and there will be more and more users with streaming processing requirements embarking on the use of Spark. Some of the highest priority projects our team is working on will be discussed below. You can expect these features in the next few versions of Spark:
Backpressure-The amount of data that may be frequently encountered in streaming operations (for example: the amount of microblogs that surged during the Academy Awards), so the system must be able to handle them perfectly. In the Spark 1.5 version, Spark will add a better Backpressure mechanism, allowing Spark Streming to dynamically control the ingestion rate of such bursts. This function is jointly completed by our Databricks and Typesafe engineers;
Dynamic scaling-Controlling the fixed data reading ingestion rate alone is not enough to handle data changes over a longer period of time. (For example: compared to night, there is a continuously higher rate of microblogging during the day). Based on this processing requirement, these changes can be dynamically scaled to the resources on the cluster. In the Spark Streaming architecture, this is very easy to implement, because the calculation has been divided into a series of small tasks, if the cluster mode (such as YARN, Mesos, Amazon EC2, etc.) requires more nodes to calculate, then they Can be dynamically allocated to a larger cluster environment. To this end, we will add Dynamic scaling to support automation;
Event time and out-of-order data-In practice, users sometimes record out-of-order data information. Spark Streaming allows users to support event time sequencing through a custom time extraction function;
UI interface enhancements-Finally, we want to enable developers to easily debug their Streaming applications. For this purpose, in Spark 1.4, we added a new visual Spark Streaming UI to allow developers to closely monitor the performance of their applications. In Spark 1.5, we further improved this feature by showing more input information (such as Kafka message offset).
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.