[Essay] Apache Flink: Very reliable, one point not bad

Source: Internet
Author: User
Tags savepoint apache flink

Apache Flink: Very reliable, one point not badApache Flink's background

We summarize the data set types (types of datasets) that are primarily encountered in the current data processing aspect at a higher level of abstraction, and the processing models (execution models) that are available for processing data, which are often confusing, but are actually different concepts

type of data set

The data set types that are encountered in the current data processing can be divided into two categories, ①unbounded, infinite Datasets, which are embodied as the fast and continuous flow data ②bounded, the limited data set, which is usually immutable, that is, the updated data set will not occur.

Traditional data processing frameworks often abstract real-world figures into limited data sets, or batch data, but real-world data is virtually limitless, and here are some examples of unbounded datasets

    1. Data generated by end-users interacting with mobile apps or Web apps

    2. Real-time measurement data for physical sensor transmission

    3. Real-time data in financial markets

    4. Machine Log

types of data processing models

Mainly divided into two categories, ①streaming, Flow-type, in the data constantly generated at the same time continue to process data ②batch, batch, in a limited time to complete a batch of data processing, after processing the end of the release of computing resources

Although the effects of a non-matching pairing may not be satisfactory, it is true that you can use any data processing model to process any type of dataset, for example, the batch model has long been used to process unbounded datasets, although it is windowing, There are various problems in state management and disordered data processing.

Flink is based on a streaming model, which continuously processes the data that is generated, and the consistency of the data set type and processing model ensures the accuracy and efficiency of the processing.

the flow gene of Apache Flink

Flink is an open-source framework for distributed streaming data processing, ① it ensures that even if the processed data arrives in a disorderly order, or the delay arrives, it can get the correct processing result ②flink is stateful (stateful), and has good fault tolerance (fault-tolerant), As a result, it can be seamlessly restored in the case of an error, and can ensure that the application state of the Excatly-once ③ good performance in large-scale applications, with high throughput and low latency on thousands of nodes

The benefits of maintaining the consistency of the data set type and data processing model are mentioned earlier, and the nature of the Flink mentioned below, including state management, unordered data processing, and flexible windowing, are all designed and optimized for accurate computation on infinite datasets.

excatly-once Semantics

Flink provides a excatly-once semantic guarantee for stateful computations, which means that the application can maintain a collection or aggregation of processed data, and Flink's checkpointing mechanism is to ensure that the application state is restored in the event of a failure. The semantics of Excatly-once

Event Time Semantics

The Flink supports the event time semantics of stream processing and windowing, and event time simplifies the calculation of exact results on a stream for a transaction that is out of order or for a deferred-arriving transaction

Flexible Windowing

In addition to data-driven windows,flink support for time-based, count-or session-windowing, you can customize windowing trigger conditions to support complex flow patterns, Flink windowing provides a way to simulate the environment when data is created

the lightweight Fault tolerant

Flink provides lightweight Fault tolerant to support both high throughput and excatly-once semantics, Flink can recover from errors without data loss (zero-data loss), which does not affect Flink reliability and performance

high throughput and low latency

savepoint mechanism

Flink's savepoint mechanism provides the ability to version the vanity stand to support uploading applications and re-processing historical data without losing status and requiring only a short outage

Distributed Support

Flink can be deployed and run on thousands of nodes, providing support for running on Mesos and YARN

batch compatibility with Apache Flink

Flink using the DataStream API to process unbounded datasets and working with the DataSet API for bounded datasets

Under the Flink framework, the bounded dataset can be considered a special case of the unbounded dataset, which is how the dataset API handles the bounded dataset, which treats the bounded dataset as a finite stream

Flink working with bounded and unbounded datasets in a similar way

[Essay] Apache Flink: Very reliable, one point not bad

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.