Apache Storm Official Document--trident spouts

Source: Internet
Author: User
Tags zookeeper

Reprinted from the Concurrent Programming network –ifeve.com This article link address: Apache Storm Official document--trident spouts

Like the general Storm API, spout is also the source of data for Trident topologies. However, in order to achieve more complex functional services, Trident Spout provides additional API interfaces on top of the normal Storm Spout.

Data sources, data streams, and operations that update state (such as a database) based on data flow are unavoidable. The Trident state article has a detailed explanation in this regard, and understanding the connection between them is important to understand how spout works.

Most of the spout in the Trident topology are non-transactional spout. In a Trident topology, you can use a normal IRichSpout interface to create a data flow:

New tridenttopology (); Topology.newstream (new myrichspout ());

All spout in the Trident topology must have a unique identity, and the identity must be unique across the Storm cluster. Trident needs to use this identity to store spout metadata (metadata) consumed from ZooKeeper, including Txid and other related spout metadata.

You can use the following configuration items to set up ZooKeeper addresses for storing spout metadata (typically, you don't need to set the following options, because Storm defaults to using the cluster's ZooKeeper server to store data--the translator note):

    1. transactional.zookeeper.servers: ZooKeeper List of servers
    2. transactional.zookeeper.port: Port for ZooKeeper cluster
    3. transactional.zookeeper.root: The root directory where metadata is stored in ZooKeeper. The metadata is stored directly in the settings directory.
Pipeline

By default, Trident processes only one batch at a time, knowing that the batch process succeeds or fails before it starts processing other batches. You can use batch pipelining to increase throughput and reduce processing latency for each batch. The maximum number of batches processed concurrently can be topology.max.spout.pending configured by.

However, even with multiple batches at the same time, Trident updates the state in the order of batch. For example, if you are working on a task that consolidates and updates the global count results to a database, you can continue to handle BATCH2, BATCH3, and even batch10 counts as you update the BATCH1 count results to the database. However, Trident will only process the state update operation for subsequent batches after the BATCH1 state update is complete. This is the necessary basis for implementing the semantics of just-in-time processing, which we have discussed in the Trident State article.

Trident spout Type

Some of the available spout API interfaces are listed below:

    1. Itridentspout: This is the most common API that supports semantic implementations of transactional and fuzzy transactional types. However, it is common to use one of its existing implementations as needed, rather than implementing the interface directly.
    2. Ibatchspout: A non-transactional spout that outputs a batch tuple at a time.
    3. Ipartitionedtridentspout: transactional spout that can read data from a distributed data source, such as a cluster or Kafka server.
    4. Opaquepartitionedtridentspout: A fuzzy transactional type spout that can read data from a distributed data source.

Of course, as mentioned at the beginning of this tutorial, in addition to these APIs, you can also use normal IRichSpout .

Apache Storm Official Document--trident spouts

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.