Brief introduction
Spring XD (eXtreme data, the limit) is a large data product for pivotal. It combines spring boot and grails to make up the execution part of the spring IO platform. Although Spring XD leverages a large number of existing spring projects, it is a run-time environment rather than a class library or framework that contains a bin directory with servers that you can start and interact with on the command line. The runtime can run on the development machine, on the client's own server, on the AWS EC2, or on the Cloud foundry.
The key component in Spring XD is the management and container server (admin and Container Servers). Using a DSL, you can submit the description of the required processing tasks to the Management Server via HTTP. The Management Server then maps the tasks that are processed to the processing module (each module is an execution unit that is implemented as a spring application context).
The product has two modes of operation:-single and Multi-node. The first is that a separate process is responsible for all processing and management work. This is useful for getting started, as well as for rapid application development and testing. All the instances in this article are designed to work in a single node mode. The second type is a distributed pattern. The distributed integrated runtime (distributed integration Runtime,dirt) distributes the tasks that are processed across multiple nodes. In addition to owning VMS or physical servers as these nodes, Spring XD allows you to run on a Hadoop yarn cluster.
The XD Management Server divides the tasks that are processed into separate module definitions and assigns each module to a container instance that uses the Apache zookeeper. Each container listens for the module definition assigned to it, and then deploys the module to create the spring application context to run it. It should be noted that when I write this article, Spring XD does not bring zookeeper itself. The compatible version is 3.4.6, which you can download from here.
The module shares data by using a configured message middleware to pass messages. The transport layer is pluggable and supports two other pivotal projects--redis and Rabbit mq--as well as off-the-shelf memory databases.
Case
The following illustration gives you a general idea of Spring XD.
The Spring XD Team believes that there are four main use cases created for creating large data solutions: data absorption, real-time analysis, workflow scheduling, and export.
Data absorption provides the ability to receive data from a variety of input sources and transfer it to large data repositories, such as HDFs (Hadoop file system), Splunk, or MPP databases. As with files, data sources may include sensors from mobile devices, support for MQ Remote Sensing Transport Protocol (MQTT), and events like Twitter for social interactions.
The absorption process runs through the processing of event-driven data, as well as batch processing of other types of data (MR, PIG, Hive, cascading, SQL, and so on). The two worlds of flow and job are very different, but spring XD attempts to blur the boundary between the channel by using channel abstraction (abstraction), so that the stream can trigger the batch job, and the batch job can send events to trigger other streams.
For streams, some real-time analysis is supported by abstractions called "taps", such as getting metrics and counting values. Conceptually, taps allows you to intervene in a stream, perform real-time analysis, and selectively generate data for external systems, such as GemFire, Redis, or other memory data grids.
Once you have data in a large data warehouse, you need some kind of workflow tool to schedule the processing. Scheduling is necessary because you write scripts or map-reduce jobs that usually run for a long time and take the form of event chains with multiple steps. Ideally, you need to be able to reboot from a specific step when an event fails, rather than completely starting from scratch.
Finally, you need to export the steps so that you can put the data in a more appropriate system, and you may be able to do further analysis. For example, from HDFs to RDBMS (relational database management systems), where you can use more traditional business intelligence tools.
Spring XD wants to provide a unified, distributed, and scalable service to meet these use cases. Instead of starting from scratch, it took advantage of a number of existing spring technologies. For example, it uses spring batch to support workflow scheduling and export use cases, uses spring integration to support streaming, and uses a variety of enterprise application integration patterns. Other key spring products include: Working with spring data processing Nosql/hadoop, using reactor to provide a simplified API for writing asynchronous programs, especially when using Lmax disruptor.
Install Spring XD
In the next section, we'll look at each use case in detail. You may want to experiment with these examples yourself. The start is very simple.
To get started, make sure that the system has at least Java JDK 6 or newer versions installed. I recommend using Java JDK 7.
For OS X users, if there is no homebrew, install and then run: