Apache Flink is an open source distributed, high performance, high availability, accurate streaming framework. Supports real-time stream processing and batch processing
Flink characteristics
Support for batch processing and data Flow program processing gracefully and smoothly support both the Java and Scala APIs support both high-throughput and low-latency support for event processing and unordered processing via the Satastream API, based on the dataflow data flow model at different time semantics (time time, processing time) Supports flexible windows (time, technology, session, custom triggers) only once for fault-tolerant guarantee automatic back-pressure diagram processing (batch) machine learning (batch) complex event processing (streaming) built-in support for iterative program (BSP) Efficient custom memory management in the dataset (batch) API and robust switching capabilities in in-memory and Out-of-core compatible with Hadoop's MapReduce and Storm integrated Yarn,hdfs,hbase and other components of the Hadoop ecosystem
Flink's application Scenario
Optimise real-time search results for e-commerce: All Alibaba's infrastructure teams use flink real-time new product details and inventory information to provide users with a higher level of relevance. Real-time streaming services for data analytics teams: King provides real-time data analysis through the Flink-powered data analytics platform, dramatically reducing time-to-watch network/sensor detection and error detection from game data: Bouygues Telecom is one of the largest telecommunications providers in France, Use Flin to monitor their wired and wireless networks for fast fault response. Business Intelligence Analytics Etl:zalando uses Flink to transform data to facilitates to the data warehouse, transforming complex conversion operations into relatively simple and ensuring that analytics end users can access data faster.
Based on the above case studies, Flink is ideally suited for:
Multiple data sources (sometimes unreliable): When the data is generated by millions of different users or devices, it is safe to assume that the data will arrive in the order in which the events were generated, and that in the case of the upstream data failure, some events may be a few hours behind them, and the data that is late will need to be calculated, and the result is accurate Application state Management: When programs become more complex than simple filtering or enhanced data structures, managing the state of these applications at this time will become more difficult (for example: counters, windows of past data, state machines, built-in databases). Flink provides tools that are effective, fault-tolerant, and controllable, so you don't need to build these features yourself. Fast data processing: There is a focus in real-time or near-real-time use case scenarios where data should be accessible from the moment the data is generated. When necessary, Flink is fully capable of meeting these delays. Massive data processing: These programs need to be distributed across many nodes to support the required scale. Flink can run seamlessly in large clusters, just like in a small cluster.
for more information on big data, videos and technical exchanges, please Dabigatran:
QQ Group No. 1:295,505,811 (full)
QQ Group number 2:54902210
QQ Group number 3:555684318