Streaming Computing Systems

Source: Internet
Author: User
Tags json relative hypertable advantage

Wen/Yangdong

This article systematically introduces and analyzes the industry mainstream Yahoo! S4, Streambase and Borealis three kinds of flow computing systems, I hope readers can from the design of these systems to understand the different scenarios of the key problems to be solved in the dirty calculation.

Background

Non-real-time computing is almost always based on the MapReduce computing framework, but MapReduce is not omnipotent. MapReduce does not solve the problem well for some real-world problems in the search application environment.

Commercial search engines, like Google, Bing, and Yahoo!, typically provide structured web results in user query responses, as well as insert text ads based on the traffic-paying click-to-pay pattern. To show the most relevant ads in the best position on the page, some algorithms are used to dynamically estimate the likelihood of an ad being clicked in a given context. The context may include information such as user preferences, geographic location, historical queries, historical clicks, and so on. A main search engine may process thousands of queries per second, and each page may contain multiple ads. To handle user feedback in a timely manner, a low latency, scalable, and highly reliable processing engine is required. However, for these applications with high real-time requirements, although MapReduce has made real-time improvements, it is still difficult to meet the application requirements stably. Because Hadoop is highly optimized for batch processing, MapReduce systems typically operate static data by dispatching bulk tasks, while one of the typical paradigms of streaming computing is an event flow inflow system with uncertain data rates, the ability to process the system to match event traffic, or gracefully downgrade by means of an approximate algorithm, Typically called a load shunt (load-shedding). Of course, in addition to the load shunt, the fault-tolerant processing of flow computing and other mechanisms are not the same as batch computing.

Recently, on Sigmod 11, Facebook published a paper on real-time data processing using Hbase/hadoop, which enables the batch computing platform to have real-time computing capabilities through some real-time transformation. There are three main drawbacks to this type of mapreduce-based approach to streaming.


Separating the input data into a fixed-size fragment, which is processed by the MapReduce platform, has the disadvantage that the processing delay is proportional to the length of the data fragment and the overhead of initializing the processing task. Small segments reduce latency, add additional overhead, and dependency management between segments is more complex (for example, a segment may require information from the previous segment), whereas large segments increase latency. The optimal segment size depends on the specific application.
In order to support streaming, MapReduce needs to be transformed into a pipeline mode rather than a reduce direct output; with efficiency in mind, intermediate results are best saved in memory medium. These changes have greatly increased the complexity of the original MapReduce framework, which is not conducive to the maintenance and expansion of the system.
Users are forced to use the MapReduce interface to define streaming jobs, which reduces the scalability of the user program.

In summary, the streaming mode determines the use of a very different architecture from batch processing, attempting to build a common platform that is suitable for both streaming and batch computing, and the result can be a highly complex system, and the final system may not be ideal for both calculations.

Current flow calculation is a hot spot in the industry, recently, Twitter, LinkedIn and other companies have open source flow computing system storm, Kafka and so on, coupled with Yahoo! Before the open source of S4, streaming computing research in the Internet field continues to heat up. However, flow-type calculation is not in recent years to start research, traditional industries such as the financial sector has been using streaming computing systems, more well-known streambase, Borealis and so on.

This article briefly introduces several flow-based computing systems used in the industry, hoping that the designers or developers of the streaming system will be able to derive inspiration from them.


Figure 1 Schematic diagram of the overall composition of the data analysis system


Figure 1 The location of the real-time computing subsystem is given from the perspective of the architecture of the whole analysis system. Real-time computing systems and batch computing systems belong to the large scope of computation, batch processing can be MapReduce, MPI, scope, etc., real-time computing can be S4, storm, etc., batch processing and real-time can or do not rely on a unified resource scheduling system. In addition, the input and output of the computing system, including the input and output of the intermediate process, are interacting with the storage systems, which can be either block storage System HDFS or K-V storage System hypertable. The upper level of the compute layer is the data warehouse, or directly interacts with the user, the interaction can be sql-like or mr-like and so on.

System

S4

S4 is a universal, distributed, extensible, partition-tolerant, pluggable streaming system. Based on the S4 framework, developers can easily develop applications for continuous stream data processing.

The design features of S4 have the following aspects.


Actor Model

In order to be distributed on a cluster of ordinary models, and without shared memory within the cluster, the S4 architecture employs actor mode, which provides encapsulation and address transparency semantics, and therefore provides a simple programming interface while allowing for large concurrency of applications. The S4 system is calculated through a processing unit (processing elements,pes), where messages are transmitted in the form of data events between processing units, and PE consumption events emit one or more events that may be handled by other PE, or directly publish the results. The state of each PE is not visible to other PE, and the only interaction mode between PE is emitting events and consumption events. The framework provides the ability to route events to the appropriate PE and to create new PE instances. The S4 design pattern conforms to the encapsulation and address transparency features.


Decentralized and symmetric Architecture

In addition to following the actor pattern, S4 also refers to the mapreduce pattern. In order to simplify deployment and operations for better stability and scalability, the S4 uses a peer architecture that is equivalent to all processing nodes in the cluster and has no central control. This architecture will make the cluster very extensible, the total number of processing nodes is theoretically unlimited, and S4 will not have a single point of fault tolerance.
Pluggable Architecture
The S4 system uses Java development, using a very hierarchical modular programming, each common function point is as abstract as possible as a general-purpose module, and as far as possible to make each module customizable.


Partial fault-tolerance

The cluster management layer based on the Zookeeper service will automatically route events from the failed node to the other node. The status of processing events on a node is lost unless the node is explicitly saved to the persistence store.


Object oriented

Inter-node communication uses the "Plain old Java Objects" (pojos) mode, and application developers do not need to write schemas or use a hash table to send tuples between nodes.

S4 's functional components are divided into 3 categories, clients, adapters, and Pnode Cluster, and Figure 2 shows the S4 system framework.


Figure 2 Yahoo! S4 Streaming system frame structure diagram


S4 provides a client Adapter that allows third-party clients to send events and receive events to the S4 cluster. Adapter implements a JSON-based API that supports client-side drivers for multi-language implementations.

The client interacts with the adapter through the driver component, and adapter is also a cluster, with multiple adapter nodes where the client can communicate with multiple driver through multiple adapter, This ensures that a single client adapter not become a bottleneck when distributing large amounts of data, but also ensures that the system supports the fast, efficient, and reliable execution of multiple client applications concurrently.

In adapter, what really interacts with the client is its stub component, which implements the ability to manage communication between the client and adapter through the TCP/IP protocol. Genericjsonclientstub This class supports the conversion of events between the client and the adapter in JSON form, thus supporting more than one type of client application. Different clients can configure different stubs to communicate with adapter, and users can define their own stubs to implement their desired business logic, which also makes the client's behavior more diverse and personalized.

Streambase

Streambase is a commercial streaming computing system developed by IBM, used in the financial industry and government, itself as a commercial application, but with develop Edition. Compared to paid-for enterprise Edition, the former has less functionality, but this does not prevent us from using external and API interfaces to analyze the streambase itself.

Streambase uses Java development, the IDE is based on Eclipse two times development, the function is very powerful. Streambase also provides a considerable number of operator, functor, and other components to help build applications. The user simply drags the control through the IDE, then associates it, sets up the schema for the transfer, and sets the control calculation process to compile an efficient streaming application. Streambase also provides a class-SQL language to describe the computational process.

The component interaction of the streambase is shown in Figure 3.


Figure 3 Streambase Component interaction diagram


Streambase server is a management process initiated on a node that manages instances of container on a node, each container the input by adapter, gives the application logic to compute, and then outputs through adapter. Each container is connected to each other to form a computational flow graph.

Adapter is responsible for interacting with heterogeneous inputs or outputs, which may include CSV files, JDBC, JMS, Simulation (stream generation simulator provided by Streambase), or user customization.
A Sytsem Container is present on each streambase server, primarily streaming data that generates system monitoring information.

The HA container is used for fault-tolerant recovery, and it can be seen that it actually contains two parts: Heartbeat and Ha Events, where heartbeat is also a tuple transfer between container. Under the HA scenario, HA container monitors the activity of the primary server and then converts the information into HA events to be handled by Streambase Monitor.

Monitor is to get data from System container and HA container and process it. Streambase that the HA problem should be handled by CEP, meaning that if there is a problem with the component, it will certainly be reflected in the output stream of system container and HA container, and then Monitor handles these tuples with complex events to detect problems such as machine failures and handle them accordingly.

Streambase proposes the following 4 template strategies to solve fault tolerance problems.


Hot-hot Server Pair Template

Both Primary server and secondary server are computed at the same time, and the results are given downstream. The advantage is that if the primary server fails then secondary server still works, there is almost no switching time, and downstream only need to select the first tuple to be processed, to ensure the fastest processing, the disadvantage is to waste computing and network resources.


Hot-warm Server Pair Template

Both Primary server and secondary server are computed at the same time, but only Primary server will give the results downstream. The advantage is that if primary server fails, secondary server can switch quickly without the need for any recovery state to work. The time is slightly longer relative to the Hot-hot mode, but there is no hot-hot that consumes network resources and also wastes computing resources.


Shared Disk Template

Primary Server calculates some of the intermediate critical states that are calculated to be stored on disk, SAN (Storage area Network), or reliable storage media. If the Srimary server fails, secondary server reads out the critical state from the media and then continues the calculation. The advantage is that no computing and network resources are wasted, but the recovery time depends on the magnitude of the state, and the recovery time may be slightly longer relative to the first two.


Fast Restart Template

This scenario limits the scenario to a stateless application only. For stateless scenarios, the scenario can be very simple, as long as a primary server failure is discovered, secondary server starts immediately, and then upstream traffic continues to compute.

Borealis

Borealis is a distributed streaming system developed in collaboration with Brandeis University, Brown University and MIT, which evolved from the previous streaming system Aurora and Medusa. Currently the Borealis system has been discontinued and the latest release release is discontinued in 2008 years.

Borealis has a wealth of papers, complete user/Developer documentation, the system is implemented in C + + and runs on the x86-based Linux platform. The system is open source, and uses more third-party open source components, including ANTLR for querying language translation, C + + Network programming framework library NMSTL and so on.

The flow model of the Borealis system is basically the same as other streaming systems: it accepts multiple data streams and outputs, in order to fault-tolerant, deterministic computation, and for the system with high fault tolerance, the input stream operator is sequenced.

The system architecture of the Borealis is shown in Figure 4.


Query Processor (QP) is the place of calculation execution, the core part of the system, and most of its functions are inherited from Aurora.
I/O queues import data streams into QP, routing tuples to other nodes or client programs.
The admin module is used to control local QP, such as creating queries, migrating flow diagram fragments, and collaborating with local optimizer to optimize existing streaming diagrams.
Local optimizer responsibilities include scheduling policies locally, adjusting operator behavior, discarding low value tuples after overloading, and so on.
The Storage Manager module is used to store state data for local computations.
Local catalog stores native data flow diagrams and metadata that can be accessed by all local components.
Borealis node also has modules that communicate with each other to perform collaborative tasks.
Neighborhood Optimizer uses local and neighbor nodes to optimize load balancing between nodes or shed load.
High Availability (HA) modules monitor each other and replace each other when they are found to be faulty.
Local monitor collects native performance-related statistics reports to local and neighborhood Optimizer.
The Global catalog provides a logical, complete view of the entire data flow calculation.

In addition to being a basic function node, Borealis Server can also be designed as a collaboration node to perform global system monitoring and other optimization tasks, such as global load distribution and globals load shedding, So Borealis actually provides complete level 3 monitoring and optimization (Local, neighborhood, Global).

For load Balancing, Borealis provides both dynamic and static deployment mechanisms.


correlation-based Operator Distribution

By analyzing the relationship of load change between different operators and nodes, the deployment of OPERATPR is determined and dynamically adjusted to achieve load balance.


Resilient Operator Distribution algorithm

The goal of the algorithm is to provide a static operator deployment scenario that handles the maximum possible range of input speed changes without needing to be re-adjusted.

Since dynamic adjustment takes time and consumption, the former is suitable for systems with long load changes, while the latter can handle faster and shorter load peaks. In the realization, the correlation coefficient is used as the node correlation index, and the NP problem is transformed into polynomial solution by greedy algorithm, and the latter is calculated before deployment to ensure the system can tolerate peak load. The algorithm is modeled on linear algebra, including Operator ordering, Operator assignment two stages.

Borealis through four fault-tolerant mechanisms to meet the needs of users.


Amnesia Backup

The standby machine discovers the host failure and immediately starts the redo from an empty state.


Passive Standby

Host processing, Standby machine, host by cycle to do checkpoint, host failure after switching to standby, replay checkpoint and data flow, for the uncertainty calculation can be well supported, the drawback is that the recovery time is longer.


Active Standby

The main standby machine calculates simultaneously, the host fails to switch directly to the standby machine, does not support the uncertainty computation, wastes the computing resources, but the recovery time is almost not.


Upstream Backup

Fault tolerance through upstream backup, replay data from upstream in case of failure, maximum recovery time, but save resources.

In addition, Borealis provides a more advanced fault-tolerant mechanism, rollback Recovery, which is a mechanism for failure recovery based on replicas in the event of node failure, network failure, or network partitioning, ensuring the availability of the system as much as possible to minimize system inconsistencies. This mechanism allows the user to define a threshold to strike a balance between consistency and availability. When the system data is restored, the system supports recalculation of output correct results to ensure eventual consistency. The mechanism uses data-serializing Operator (sunion) to ensure that all replicas process data in the same order. After the failure recovery, the recovery replay is realized by Checkpoint/redo and Undo/redo.

Contrast

Table 1 compares the above 3 flow systems, based on the various models involved in the Ifpsurvey of the DEBS2011 conference. The processing model describes the selection strategy, the consumption strategy and the load demotion process when the flow tuple is calculated. The Interaction model describes the interaction of input components and computing systems, internal calculations, and computing and output components. The time model describes whether the stream of events is constrained by timing. The rules model describes whether a streaming calculation rule is a display or an implicit one. The data model describes the composition, format, and so on of a stream. Function model describes the functional models of streaming computing systems. Language model describes the various operators at the language level.


Table 1 Model comparison of 3 kinds of flow systems


Summary
In this paper, we introduce 3 stream computing systems in the industry, and hope to realize the key problems to be solved in the design of these systems.
Yahoo! S4 's latest version is Alpha version v0.3.0, dynamic load balancing and online service migration and other important features have not yet been achieved, but its representative 3 features worth learning, actor mode, non-centralized symmetric structure and pluggable architecture.

Streambase is a powerful IDE that supports a control-style approach to building applications, while also providing a high-level language for building applications. Because it is a commercial product, the design of its user interface is worthy of reference, and its composable ha scheme is also one of the highlights.
Borealis is an important output of academic research, it is a new generation of flow system involved in many aspects, such as data model, load management, high availability, scalability are all made a comprehensive and detailed research, on the one hand the system becomes strong, advanced, on the other hand makes the system also become bloated, complex. Many of the strategies of this system are worth learning and can be applied to different streaming computing scenarios.

Author Yangdong, Baidu distributed senior research and development engineer, engaged in hypertable, Hadoop and flow-type computing research from: http://www.programmer.com.cn/8606/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.