In the previous article we talked about Flink stream source, which is the starting point for the entire DAG (directed acyclic graph) topology as the data entry for the stream. Then the data exit of the stream corresponds to the Sinkof the source. This is what we read in this article.
Sinkfunction
With SourceFunction
correspondence, the root interface of the Flink for sink is called SinkFunction
. Inherit from Function
this tag interface. SinkFunction
the interface provides only one method:
voidvalue) throws Exception;
The method provides a record-level-based call (that is, every record that is output calls the interface once). The parameters of the above method are the value
records that need to be output.
SinkFunction
Relatively concise, let's look at its implementation.
Built-in Sinkfunction
Again, let's take a look at the complete type Inheritance system:
Discardingsink
This is the simplest implementation of sinkfunction, and its implementation is equivalent to not implementing it (in fact, it is now an empty method ). Its role is to discard records. Its main scenario should be those that do not require final processing of the results.
Writesinkfunction
WriteSinkFunction
is an abstract class. The primary function of this class is to export the desired output tuples
(tuple) as a simple text output to a file of the specified path, and the tuple is collected into a list and then periodically written to the file.
WriteSinkFunction
The constructor receives two parameters:
- Path: The file path to write to
- Format:
WriteFormat
An instance that specifies the formatting to write to the data
In the constructor, it calls the method cleanFile
that initializes the file for the specified path. The initialization behavior is: Create if it does not exist, and empty if it exists .
Implementation of the Invoke method:
publicvoidinvoke(IN tuple) { tupleList.add(tuple); if (updateCondition()) { format.write(path, tupleList); resetParameters(); } }
From the implementation, its first will need to add the sink tuple to the internal collection. Then call the updateCondition
method. The method is an WriteSinkFunction
abstract method that is defined. Used to determine the criteria for writing tuplelist to a file and emptying tuplelist. The tuple in the collection is then written to the specified file. Finally, the method is called resetParameters
. The method is also an abstract method, and its main purpose is that when the scene being written is a bulk write, there may be some state parameters that are used to reset the state.
Writesinkfunctionbymillis
The class is WriteSinkFunction
the implementation class. It supports batch writes of a tuple to a file at a specified interval of milliseconds. The interval is specified by the constructor parameter millis
. Internally WriteSinkFunction
to maintain the lastTime
time state of the last write. It mainly involves the implementation of the two abstract methods mentioned above:
protectedbooleanupdateCondition() { return System.currentTimeMillis() - lastTime >= millis; }
updateCondition
Implementation is simple, compare the current time stamp of the current host with the last execution timestamp state: If it is greater than the specified interval, the condition is true and the write is triggered.
protectedvoidresetParameters() { tupleList.clear(); lastTime = System.currentTimeMillis(); }
resetParameters
The implementation is to first empty the tuplelist and then overwrite the lasttime old timestamp state with the latest timestamp.
Writeformat
A write-format abstract class that provides two implementations:
- Writeformatastext: Writes a file of the specified path in the form of an intact text
- Writeformatascsv: Writes the specified file in CSV format
Richsinkfunction
RichSinkFunction
Provides the AbstractRichFunction
basis for implementing a rich sinkfunction through inheritance ( AbstractRichFunction
provides a open/close method pair, and the means to get the runtime context object). RichSinkFunction
is also an abstract class, it has three concrete implementations.
Socketclientsink
Enables the socket to send data to the server on which the specific destination host is located as the sink of the Flink stream. The data is serialized into a byte array and then written to the socket. The sink supports message delivery for failed retry mode. The sink can be enabled and, autoFlush
if enabled, can result in a significant decrease in throughput, but also a decrease in latency. The constructor of the method, which provides the parameters:
- HostName: Host name of the server to be connected
- Port of Port:server
- Schema:
SerializationSchema
An instance of the object to serialize.
- Maxnumretries: Maximum Retry count (-1 for Infinite retry)
- AutoFlush: whether auto flush
The retry policy is in the invoke
method and is entered into the exception snap block when the send fails.
Outputformatsinkfunction
An OutputFormat
implementation of the sinkfunction that writes records to.
OutputFormat: Defines the output interface of the consumed record. Specifies how the final record is stored, such as a file is a storage implementation.
Printsinkfunction
The implementation is used to output each record to the standard output stream (StdOut) or standard error stream (STDERR). At output, if the number of parallel subtask instances of the current task is greater than 1, meaning that the current task is executed in parallel (with multiple instances at the same time), a prefix is output before each record is output prefix
. Prefix is the position of the current subtask in the global context.
Sink in common connectors
Flink itself provides some connector support for third-party mainstream open source systems, which are:
- Elasticsearch
- Flume
- Kafka (0.8/0.9 version)
- Nifi
- Rabbitmq
- Twitter
The sink of these third-party systems (except Twitter) are inherited from RichSinkFunction
.
Summary
In this article, we mainly talk about the design and implementation of Flink's stream sink. Of course, the subject has not been fully discussed, and there will be further reading.
Scan code Attention public number: Apache_flink
Apache Flink Source Parsing Stream-sink