Apache samza is a distributed stream processing framework. It uses Apache Kafka for messaging, and Apache hadoop yarn to provide fault tolerance, processor isolation, security, and resource management.
Yarn is the next generation mapreduce framework,
The fundamental idea of refactoring is to separate the two main functions of jobtracker into separate components, which are resource management and task scheduling/monitoring. The new resource manager globally manages the allocation of computing resources for all applications. The applicationmaster of each application is responsible for corresponding scheduling and coordination. An application is nothing more than a single traditional mapreduce task or a DAG (directed acyclic graph) task. ResourceManager and the node Management Server of each machine can manage processes on that machine and organize computing.
About Fault Tolerance: Whenever a machine in the cluster fails, samza works with yarn to transparently migrate your tasks to another machine.
Kafka's broker producer and consumer are both distributable. The implementation is to maintain the information of the three in the cluster through zookeeper, so as to achieve interaction between the three.
Mechanism Analysis of samza/Kafka