Apache Flink Source Parsing Stream-sink

Source: Internet
Author: User
Tags apache flink

In the previous article we talked about Flink stream source, which is the starting point for the entire DAG (directed acyclic graph) topology as the data entry for the stream. Then the data exit of the stream corresponds to the Sinkof the source. This is what we read in this article.

Sinkfunction

With SourceFunction correspondence, the root interface of the Flink for sink is called SinkFunction . Inherit from Function this tag interface. SinkFunctionthe interface provides only one method:

    voidvalue) throws Exception;

The method provides a record-level-based call (that is, every record that is output calls the interface once). The parameters of the above method are the value records that need to be output.

SinkFunctionRelatively concise, let's look at its implementation.

Built-in Sinkfunction

Again, let's take a look at the complete type Inheritance system:

Discardingsink

This is the simplest implementation of sinkfunction, and its implementation is equivalent to not implementing it (in fact, it is now an empty method ). Its role is to discard records. Its main scenario should be those that do not require final processing of the results.

Writesinkfunction

WriteSinkFunctionis an abstract class. The primary function of this class is to export the desired output tuples (tuple) as a simple text output to a file of the specified path, and the tuple is collected into a list and then periodically written to the file.

WriteSinkFunctionThe constructor receives two parameters:

    • Path: The file path to write to
    • Format: WriteFormat An instance that specifies the formatting to write to the data

In the constructor, it calls the method cleanFile that initializes the file for the specified path. The initialization behavior is: Create if it does not exist, and empty if it exists .

Implementation of the Invoke method:

    publicvoidinvoke(IN tuple) {        tupleList.add(tuple);        if (updateCondition()) {            format.write(path, tupleList);            resetParameters();        }    }

From the implementation, its first will need to add the sink tuple to the internal collection. Then call the updateCondition method. The method is an WriteSinkFunction abstract method that is defined. Used to determine the criteria for writing tuplelist to a file and emptying tuplelist. The tuple in the collection is then written to the specified file. Finally, the method is called resetParameters . The method is also an abstract method, and its main purpose is that when the scene being written is a bulk write, there may be some state parameters that are used to reset the state.

Writesinkfunctionbymillis

The class is WriteSinkFunction the implementation class. It supports batch writes of a tuple to a file at a specified interval of milliseconds. The interval is specified by the constructor parameter millis . Internally WriteSinkFunction to maintain the lastTime time state of the last write. It mainly involves the implementation of the two abstract methods mentioned above:

    protectedbooleanupdateCondition() {        return System.currentTimeMillis() - lastTime >= millis;    }

updateConditionImplementation is simple, compare the current time stamp of the current host with the last execution timestamp state: If it is greater than the specified interval, the condition is true and the write is triggered.

    protectedvoidresetParameters() {        tupleList.clear();        lastTime = System.currentTimeMillis();    }

resetParametersThe implementation is to first empty the tuplelist and then overwrite the lasttime old timestamp state with the latest timestamp.

Writeformat

A write-format abstract class that provides two implementations:

    • Writeformatastext: Writes a file of the specified path in the form of an intact text
    • Writeformatascsv: Writes the specified file in CSV format
Richsinkfunction

RichSinkFunctionProvides the AbstractRichFunction basis for implementing a rich sinkfunction through inheritance ( AbstractRichFunction provides a open/close method pair, and the means to get the runtime context object). RichSinkFunctionis also an abstract class, it has three concrete implementations.

Socketclientsink

Enables the socket to send data to the server on which the specific destination host is located as the sink of the Flink stream. The data is serialized into a byte array and then written to the socket. The sink supports message delivery for failed retry mode. The sink can be enabled and, autoFlush if enabled, can result in a significant decrease in throughput, but also a decrease in latency. The constructor of the method, which provides the parameters:

    • HostName: Host name of the server to be connected
    • Port of Port:server
    • Schema: SerializationSchema An instance of the object to serialize.
    • Maxnumretries: Maximum Retry count (-1 for Infinite retry)
    • AutoFlush: whether auto flush

The retry policy is in the invoke method and is entered into the exception snap block when the send fails.

Outputformatsinkfunction

An OutputFormat implementation of the sinkfunction that writes records to.

OutputFormat: Defines the output interface of the consumed record. Specifies how the final record is stored, such as a file is a storage implementation.

Printsinkfunction

The implementation is used to output each record to the standard output stream (StdOut) or standard error stream (STDERR). At output, if the number of parallel subtask instances of the current task is greater than 1, meaning that the current task is executed in parallel (with multiple instances at the same time), a prefix is output before each record is output prefix . Prefix is the position of the current subtask in the global context.

Sink in common connectors

Flink itself provides some connector support for third-party mainstream open source systems, which are:

    • Elasticsearch
    • Flume
    • Kafka (0.8/0.9 version)
    • Nifi
    • Rabbitmq
    • Twitter

The sink of these third-party systems (except Twitter) are inherited from RichSinkFunction .

Summary

In this article, we mainly talk about the design and implementation of Flink's stream sink. Of course, the subject has not been fully discussed, and there will be further reading.

Scan code Attention public number: Apache_flink

Apache Flink Source Parsing Stream-sink

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.