Building IBM Infosphere streams applications using the Java programming language

Source: Internet
Author: User

Brief introduction

IBM Infosphere Streams (hereinafter referred to as Streams) is a highly reliable, highly scalable, distributed streaming computing platform launched by IBM in 2009 that proactively supports 6G per second or 21600G per hour (equivalent to the number of pages on the Internet) As the index of system design, processing ability realizes the ability of "eternal analysis" of streaming data. It contains a run-time environment (or an instance) and a programming model to simplify the development of applications that require extraction, filtering, analysis, and association of large amounts of continuous stream data, and can be widely applied to solutions in manufacturing, retailing, transportation, financial securities and regulatory industries, The idea of making business decisions in real time is realized.

The overall structure of the Streams application, as shown in the following illustration, is made up of a series of operators connected to each other through an input/output port. For the convenience of description, we first give some terms in the Streams application:

Flow: represents any continuous stream of data from a data source that is represented by a set of attributes.

Operator: A functional component of a stream data processing that receives one or more input streams, processes the tuples and attributes corresponding to the convective data, and eventually produces one or more output streams.

Input port: A port used to receive output streams from other operators.

Output port: The port used to generate the output stream.

Processing element (PE): The Streams application is physically partitioned into a series of processing units, typically in the form of dynamic connection libraries.

Operator Fusion: An optimization technique that combines multiple operators to produce a PE to reduce the cost of data transmission between multiple operators.

Job: Streams How the application behaves on Streams run time.

Figure 1. Streams Application Architecture

The Java programming language, as a third-generation high-level language, has been widely used in the Internet distributed environment since its inception in the 1990s, with its simple, complete image object, platform portability, robust sandbox security mechanism, dynamic, and a large number of available development packages. Covers almost every aspect of Internet applications. In comparison with C + + programmers, Java programmers simply focus on the development of business logic without having to dwell on the details of resource allocations and releases related to the system, which are handled by the Java Virtual Machine (JVM), which greatly improves development efficiency. Because the Java program is running on top of the virtual machine, its performance may be slightly inferior to that of the native C + + program, but the virtual machine has been shrinking gradually after several decades of development and optimization. Especially in the CPU performance significantly increased today, the performance of the difference is becoming more and more inconspicuous.

Based on the above considerations, the Streams platform provides a framework for building Streams applications using the Java programming language, including the Java operator Model description file and the Java operator API in two parts. The following figure is a concrete example of the dbpersist operator that implements the database storage function, in which the left half of XML provides the model definition of the operator through a series of attributes, and the right half corresponds to the implementation of the Java class, primarily the processing logic of tuples. In this particular example, the Java implementation of the operator relies on a number of Third-party jar packages, in addition to their own jar packages, which need to be specified in the operator model.

Figure 2. Java operator model and implementation

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.