The following document is translated this morning, because to work, time is rather hasty, some parts did not translate, please forgive me.
June 01, 2017 the Apache Flink community officially released the 1.3.0 version. This release underwent four months of development and resolved 680 issues. Apache Flink 1.3.0 is the fourth major version on the 1.X.Y version line, and its API is compatible with other 1.X.Y APIs that use @Public annotations.
In addition, the Apache Flink community is currently developing a major release every April (Apache Flink 1.2.0 is 2017-02 released, and 1.3.0 is just four months apart), so we can expect Apache Flink 1.4.0 to be released approximately in October.
The main updates are as follows
Large State Handling/recovery
-
Rocksdb Incremental checkpoint (Incremental checkpointing for Rocksdb) : It is now supported to save only the data that was added after the last successful checkpoint, rather than saving all application state. This will speed up the checkpoint time and will reduce disk space consumption accordingly, because each checkpoint will be smaller in size. For details, see FLINK-5053.
-
Asynchronous snapshot based on the heap state backend (asynchronous snapshotting) : Now the file backend and the memory backend (backends) use the Write-time copy HashMap implementation. Makes it support asynchronous snapshots. Asynchronous snapshots make the Flink heap slow storage systems and expensive serialization more resilient. For details, see FLINK-6048, FLINK-5715.
-
allow upgrade status of serializer : Now we can save the state of the application with the serializer of the upgrade state.
-
Restore job status with operator granularity : Before Apache Flink 1.3.0, the state of the operator is bound to the inside of the task, making it difficult to change the topology of the job while maintaining the job state. And now we can do a lot of modifications to the topology. See FLINK-5892 for details.
-
Fine-grained recovery (Beta) : In the event of a task failure, we can simply restart those affected subgraph without restarting the entire executiongraph, This will significantly reduce recovery time, see FLINK-4256 for details.
If you want to keepup-to-date with spark, Hadoop, or hbase-related articles, please follow public accounts:iteblog_hadoop
DataStream API
Side Outputs: This feature allows an operator to have multiple output streams. operator metadata, internal system information (debug, performance, etc.) or reject \ Delay data will be a potential use case for this feature. The window operator now uses this function to process the deferred data. See FLINK-4460.
Union OperatorState: Flink 1.2.0 introduces the broadcast status feature (broadcast-the-states functionality), but this feature is not open to the outside. Flink 1.3.0 provides the union Operator State API to open the broadcast status function. See FLINK-5991 for details.
for each window's state : Prior to this, the state that windowfunction or processwindowfunction can access is limited to the window's key, not the window itself. With this new feature, the user can maintain the window state and be independent of the key. See FLINK-5929.
Deployment and Tooling
flink History Server : Flink's historyserver now allows you to query the status and statistics of completed jobs JobManager archived, see FLINK-1579 for details.
in the Web front-end monitoring watermark: To make it easier to diagnose watermark related issues, the Flink JobManager front end now provides a new tab to track the watermark of each operator. See FLINK-3427 for details.
Datadog HTTP Metrics Reporter: Datadog is a very wide range of indicator systems used. Flink now provides a Datadog reporter that is directly linked to the Datadog HTTP endpoint. See FLINK-6013 for details.
Network Cache Configuration : We finally got rid of the cumbersome network buffer configuration and replaced it with a more generic approach. Instead of defining an absolute number of network buffers, we now use the portion of the available JVM memory (by default, 10%).
Table Api/sql
Support for retractions in Table api/sql: As part of our endeavor to support continuous queries on Dynamic tabl ES, retraction is an important building block that would enable a whole range of new applications which require updating PR eviously-emitted results. Examples for such use cases is computation of early results for long-running windows, updates due to late arriving data, or maintaining constantly changing results similar to materialized views in relational database systems. Flink 1.3.0 supports retraction for non-windowed aggregates. Results with updates can is either converted into a DataStream or materialized to external data stores using Tablesinks WI Th Upsert or retraction support.
table Api/sql supports more aggregations : The table API and SQL in Flink 1.3.0 support more types of aggregations, including
Both Batch and streaming SQL support the Group by Window aggregation operation (via the window function tumble, HOP, and SESSION windows
SQL over window aggregations (only for streaming)
non-windowed aggregations (in streaming with retractions).
User-defined aggregate functions
supports external catalog: Table API and SQL allow to register external catalogs. The table API and SQL can query the table and its schema-related information through an external catalogs, without having to register with table one by one.
Currently, table Api/sql's documentation has been rewritten and is expected to be released on June 05.
Connectors
Support ElasticSearch 5.x: ElasticSearch connectors related code is refactored, the new code structure is clearer, all the common modules associated with ElasticSearch are put into common base, The code associated with the Elasticsearch version is placed in separate modules, similar to the Kafka code structure. For details, see FLINK-4988.
Allow rescaling the Kinesis Consumer: Flink 1.2.0 introduced rescalable state for DataStream programs. With Flink 1.3, the Kinesis Consumer also makes use of this engine feature (FLINK-4821).
Transparent Shard Discovery for Kinesis Consumer: The Kinesis Consumer can now discover new shards without Failin G/restarting Jobs When a resharding is happening (FLINK-4577).
Allow setting custom start positions for the Kafka consumer: With this change, can instruct Flink ' s Kafka con Sumer to start reading messages from a specific offset (FLINK-3123) or earliest/latest offset (FLINK-4280) without Respe Cting committed offsets in Kafka.
Allow out-opt from offset committing for the Kafka consumer: By default, Kafka commits the offsets to the Kafka b Roker once a checkpoint has been completed. This change allows the users to disable this mechanism (FLINK-3398).
CEP Library
The CEP library has been greatly enhanced and are now able to accommodate more use-cases Out-of-the-box (expressivity Enhan Cements), make + efficient use of the available resources, adjust to changing runtime conditions–all without breaking B ackwards compatibility of operator state.
Please note that the API of the CEP library have been updated with this release.
Below is some of the main features of the revamped CEP library:
Make CEP operators rescalable: Flink 1.2.0 introduced rescalable state for DataStream programs. With Flink 1.3, the CEP library is also makes use of this engine feature (FLINK-5420).
new operator introduced by CEP library :
The quantifier of the schema API (*,+,? ) (FLINK-3318)
Support for different continuity (continuity) requirements (FLINK-6208)
Support for iterative conditions (FLINK-6197)
Gelly Library
Unified driver for running gelly examples FLINK-4949).
PageRank algorithm for directed graphs (FLINK-4896).
Add circulant and Echo graph generators (FLINK-6393).
Known Issues
There is known issues in Flink 1.3.0. Both'll is addressed in the 1.3.1 release.
Apache Flink 1.3.0 official release and introduction to new features