1. Hardware configuration information
6 servers, 2 cpu,96g,6 cores, 24 threads 2, cluster information
Storm cluster:1 x nimbus,6 Supervisor
nimbus:192.168.7.127
Supervisor
192.168.7.128
192.168.7.129
192.168.7.130
192.168.7.131
192.168.7.132
192.168.7.133
Zookeeper cluster:
3 nodes
192.168.7.127:2181, 192.168.7.128:2181, 192.168.7.129:2181
Kafka cluster:
7 nodes
192.168.7.127:9092
192.168.7.128:9092
192.168.7.129:9092
192.168.7.130:9092
192.168.7.131:9092
192.168.7.132:9092
192.168.7.133:9092 3, configuration Relationship resolution
The following information can be calculated according to the hardware configuration of the server:
1, the worker and slot relationship is one by one corresponding, a worker occupies a slot. Calculating the number of worker and slots in a cluster is generally calculated as the number of CPU threads per server.
As the above environment is
Worker, slot:144 (6 supervisor, each supervisor is a cpu,24*6=144 of 24 threads)
2, spout concurrency number, that is, setspout after the parameters of the------builder.setspout ("words", newkafkaspout (Kafkaconfig), 10);
Here I am testing, is using Kafka and storm to do data transmission, Kafka has a partition mechanism, spout the number of threads according to Kafka topic number of partition
To define, typically a 1:1 relationship, that is, the current topic partition number is 18, then the number of spout threads can be set to 18. Can be a little more than this, but not
How many Kafka do you need? Partition you can do the test according to your needs. Find the values you need
3, the concurrent number of bolts------Builder.setbolt ("words", Newkafkabolt (), 10);
Bolt concurrency, determines the processing efficiency, bolt concurrency is 1, the face of large data volume may be very slow, bolt concurrency high, also not good, may be as a waste of resources.
Specific values need to be tested and determined
3. Throughput test (only some of the scenarios are listed below.) See attached for all test data)
Test Scenario 1:
Partition:20
Worker:10
Spout:20
Bolt:1
Calculation Result:
Test Scenario 2:
Partition:20
Worker:20
Spout:20
Bolt:1
Test results:
Scenario 3: (The data generator executes on 128-132, each program 100 out-of-the-box, resources also have a certain occupancy, so the actual results may be better than the test results)
Topic 5
Partition 20
Spout 20
Worker 20
Bolt 1
Test results:
Summary results:
5 topic,20 partition,20*5 worker,20*5 a spout,1*5 bolt
Total Throughput =5.04+4.02+5.76+6.31+4.99=26.12
Throughput of 261,200 per second
Daily throughput of nearly 22.6 billion
Summarize:
There are several factors that affect storm throughput: spout concurrency, number of workers (linked to slots), number of partition Kafka
In fact, the number of concurrent spout and Kafka partition is linked.
It is important to note that increasing the number of workers can increase throughput, but be aware that the number of workers is tied to the number of machines in the cluster and is limited.
So you need to pass the test to set a value that you think is reasonable, because if a task has too many workers set up, the worker that leaves the other task
The smaller the number, the less tasks you will run. So as long as the business needs to meet the value of the best;
The specific test results look at the annex;
Reprint please specify source address:
http://blog.csdn.net/weijonathan/article/details/38536671
Http://www.51studyit.com/html/notes/20140813/1054.html