Recent work involves designing a system to monitor the status of the system in real time, such as the execution of hadoop tasks and the health of the server. This system needs to process the information generated by the object in real time and send it to the user.
This system obviously requires the following features:
- Reliability
- Big Data Processing
- Real-time
Obviously, this is a hadoop-based project.
Kafka: Kafka is a messaging system that was originally developed at LinkedIn to serve as the foundation for LinkedIn's activity stream processing pipeline.
Nice talk
S4: S4 is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams
Of data.
Hedwig: Hedwig is
Publish-subscribeSystem designed to carry large amounts of data processing ss the Internet in
Guaranteed-deliveryFashion from those who produce it (Publishers) To those who are interested in it (Subscribers).
Storm: storm is a distributed, reliable, and fault-tolerant stream processing system. Its use cases are so broad that we consider it to be a fundamental new primitive for Data
Processing.
Introduction slide
Flume: Apache flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Its main goal is to deliver data from applications
To Apache hadoop's HDFS.
Scribe: scribe is a server for aggregating streaming log data. It is designed to scale to a very large number of nodes and be robust to network and node failures.
As the project follows up, I will continue to update it.