I. What is a real-time computing system? (Flow-based calculation)
1. Offline computing and real-time computing
Offline computing real-time computing (streaming computing)
Typical for mapReduce Apache Strom,spark streaming and Jstream
Real-time data on data HDFs
Acquisition of data Sqoop (batch import) flume
Save Results on HDFS Redis (Hdfs,hbase,hive,jdbc[oracle,mysql])
2. Example: Tap water Treatment
3,strom Architecture
(*) Master node: Nimbus
From node: Supervisor
(*) Topology task = spout Task + Bolt Task
Spout Task: Collecting data
Bolt task: Working with data, you can cascade
4. The WordCount in Strom
(*) startup process
(1) Start zk,zkserver.sh Start--zooinspector tool Zookeeper Viewer
(2) Start Nimbus Strom Nimbus &
(3) Start from Node Strom supervisor &
(4) Start Ui:strom UI &
(5) Boot log: Strom Logviewer &
(*) Start WordCount:
Strom jar Strom-starter-topologies.jar Org.apache.strom.starter.WordCountTopology MYWC
5. Analyze the flow of data for a task (Strom programming model)
Topology task = spout Task + Bolt Task
Spout Task: Collecting data
Bolt task: Working with data, you can cascade
Second, the real-time message processing system based on Apache Strom, namely: Streaming processing system
Three traditional message processing system based on middleware WebLogic JMS
1. Jms:java Messaging Service, support Queue,topic
2. What is a message?
(*) Point-to-point:queue queue
(*) Publish-subscribe:topic Broadcast
Four real-time message system based on Apache Kafka
1. Support only topic (broadcast)
Apache Strom and Kafka's simple notes (0)-Start