I recently tested data processing performance using storm.
Topology Structure: spout outputs 0.8 million data (in CSV format); bolt1 parses CSV data and splits each information segment; bolt2 is summarized by a field in bolt1, and is written into the database after the count is accumulated (loading Trigger frequency: 60 s ).
Concurrency configuration: spout task (1) Executor (3); bolt1 excutor/task (16); bolt2 excutor/task (8 );
Workers (8); storm slot (8)
Hardware configuration: 8 CPU, 16g memory
The processing performance is about 1w5/s.
Some problems were found during the test, which delayed some time.
Storm UI 0.9.2 has a bug when displaying the topology summary. The number of worker and excutor values is reversed. Storm list can be used for verification in the command line.
Strom sometimes times out when distributing tasks. The cause of the exception is unknown:
2014-09-22 13:18:34 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-ip-61/ip:6703... [11]2014-09-22 13:18:35 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-ip-62/ip:6703... [12]2014-09-22 13:18:35 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-ip-61/ip:6703... [12]
Conclusion:
In storm, worker is the processing process, excutor is the thread under worker, and task is the specific instance object (spout/Bolt ). When performance is insufficient, you can adjust the concurrency, number of slots, and number of workers to improve performance.
Storm 0.9.2 single-host performance test