Apache Samza Series Real-time streaming data processing framework Samza Chinese Course (i)--Introduction

Source: Internet
Author: User
What is Samza.

Apache Samza is a distributed streaming processing framework. It uses Apache Kafka for message sending and Apache Hadoop YARN to provide fault tolerance, processor isolation, security, and resource management. Dedicated to real-time data processing, much like Twitter's streaming system storm.

Recently, from the official Apache blog, the open source distributed streaming framework Samza after a 18-month incubation period has finally upgraded to become Apache's top project. Samza was open source by LinkedIn in September 2013 and contributed to Apache as a incubation project.

The Samza is ideal for real-time streaming data processing (like Apache Storm), such as data tracking, log services, and real-time services, which can help developers with high-speed message processing while also providing good fault tolerance. During the Samza flow data processing, each Kafka cluster is connected to a cluster that can run yarn and processes Samza jobs. A simple process for Samza is shown in the following illustration:

Samza has the following characteristics: The simple Api:samza provides a simple callback-based and MAPREDUCE-compliant message processing API State Management: Samza provides a leveldb-based key/ Value database to store historical data, enabling stateful message management fault-tolerant processing: yarn will transparently migrate related tasks to other machines whenever a machine in the cluster fails Persistence: Samza uses Kafka to guarantee the orderly processing of messages and to persist to the partition, without the potential for loss of possible extensibility of messages: Samza is partitioned and distributed in each layer structure, Kafka provides an ordered, partitioned, append-tolerant Yarn provides a distributed, SAMZA-run container environment. pluggable /out-of-the-box: Samza provides a pluggable feature API that enables SAMZA to use not only Kafka and yarn, but also other messaging systems and execution environments. Resource Isolation: The official example of support for Hadoop security model and resource isolation by using yarn

The following figure is an example of the official Samza, which calculates the number of page visits based on the member ID grouping. The ingress messages are from Machine1, 2, Exit is Machine3, we can understand that the message is scattered in different message systems (KAFKA), Samza from different Kafka read topic, after the topic is processed, sent to Machine3, Do not do too much decomposition here, the specific reference to the official documents.


Sample samples Hello Samza try to use Samza, read the background page for more information about Samza.
Website Link: http://samza.apache.org/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.