Storm environment configuration and throughput Test tuning summary

Source: Internet
Author: User
Tags generator zookeeper


Questions Guide1. What is the cluster environment in this article? 2. What is the relationship between worker and slot in the configuration? 3. How the throughput is tested.
1. Hardware configuration information
6 servers, 2 cpu,96g,6 cores, 24 threads
2. Cluster information
Storm cluster: 1 x nimbus,6 supervisor nimbus:192.168.7.127 supervisor:192.168.7.128 192.168.7.129 192.168.7.130 192.168.7.131 192.168.7.132 192.168.7.133
Zookeeper cluster: 3 nodes 192.168.7.127:2181, 192.168.7.128:2181, 192.168.7.129:2181 Kafka cluster: 7 nodes 192.168.7.127:9092 192.168.7.128:9092 192.168.7.129:9092 192.168.7.130:9092 192.168.7.131:9092 192.168.7.132:9092 192.168.7.133:9092
3. Configuration Relationship Resolution
The following information can be calculated according to the hardware configuration of the server: 1. The relationship between worker and slot is one by one, and a worker occupies a slot. Calculating the number of worker and slots in a cluster is generally calculated as the number of CPU threads per server. As the above environment is worker, slot:144 (6 supervisor, each supervisor is 24 thread cpu,24*6=144) 2, spout concurrency number, That is, the parameters behind setspout------builder.setspout ("words", newkafkaspout (Kafkaconfig), 10); Here I test, is to use Kafka and storm to do data transmission, Kafka has a partition mechanism, spout the number of threads according to Kafka topic number of partition defined, is generally 1:1 of the relationship, That is, the number of partition for the current topic is 18, the number of threads in spout can be set to 18. Can be a little more than this, but not too much; how many Kafka partition do you need to test to find the number you need 3, the number of bolt concurrency------builder.setbolt ("words", Newkafkabolt (), 10); Bolt concurrency, determines the processing efficiency, bolt concurrency is 1, the face of large data volume may be very slow, bolt concurrency high, also not good, may be as a waste of resources. Specific values need to be tested and determined

3. Throughput test (only some of the scenarios are listed below.) See attached for all test data)
Test Scenario 1:partition:20 worker:10 spout:20 bolt:1 calculation results:
Test Scenario 2:partition:20 worker:20 spout:20 bolt:1 test results:

Scenario 3: (Data generator is executed on 128-132, each program 100 out-of-the-box, resources also have a certain occupancy, so the actual results may be better than the test results) Topic 5 Partition Spout Worker Bolt 1 Test results:

Summary results:5 x topic,20 partition,20*5 worker,20*5 spout,1*5 bolts Total Throughput =5.04+4.02+5.76+6.31+4.99=26.12
Throughput per second is 261,200 daily throughput of nearly 22.6 billion summary: There are several factors that affect storm throughput: spout concurrency, number of workers (linked to slots), number of partition in Kafka In fact, the number of concurrent spout and Kafka partition is linked. It is important to note that increasing the number of workers can increase throughput, but be aware that the number of workers is tied to the number of machines in the cluster and is limited. So you need to set a value that you think is reasonable by testing, because if one task sets too many workers, the fewer workers are left for other tasks, and the fewer tasks you will run. So as long as the business needs to meet the value of the best, the specific test results see Attachment; reprint please specify Source address: http://blog.csdn.net/weijonathan/article/details/38536671 Questions Guide 1. What is the cluster environment in this article? 2. What is the relationship between worker and slot in the configuration? 3. How the throughput is tested.

1. Hardware configuration information
6 servers, 2 cpu,96g,6 cores, 24 threads
2. Cluster information
Storm cluster: 1 x nimbus,6 supervisor nimbus:192.168.7.127 supervisor:192.168.7.128 192.168.7.129 192.168.7.130 192.168.7.131 192.168.7.132 192.168.7.133
Zookeeper cluster: 3 nodes 192.168.7.127:2181, 192.168.7.128:2181, 192.168.7.129:2181 Kafka cluster: 7 nodes 192.168.7.127:9092 192.168.7.128:9092 192.168.7.129:9092 192.168.7.130:9092 192.168.7.131:9092 192.168.7.132:9092 192.168.7.133:9092
3. Configuration Relationship Resolution
The following information can be calculated according to the hardware configuration of the server: 1. The relationship between worker and slot is one by one, and a worker occupies a slot. Calculating the number of worker and slots in a cluster is generally calculated as the number of CPU threads per server. As the above environment is worker, slot:144 (6 supervisor, each supervisor is 24 thread cpu,24*6=144) 2, spout concurrency number, That is, the parameters behind setspout------builder.setspout ("words", newkafkaspout (Kafkaconfig), 10); Here I test, is to use Kafka and storm to do data transmission, Kafka has a partition mechanism, spout the number of threads according to Kafka topic number of partition defined, is generally 1:1 of the relationship, That is, the number of partition for the current topic is 18, the number of threads in spout can be set to 18. Can be a little more than this, but not too much; how many Kafka partition do you need to test to find the number you need 3, the number of bolt concurrency------builder.setbolt ("words", Newkafkabolt (), 10); Bolt concurrency, determines the processing efficiency, bolt concurrency is 1, the face of large data volume may be very slow, bolt concurrency high, also not good, may be as a waste of resources. Specific values need to be tested and determined

3. Throughput test (only some of the scenarios are listed below.) See attached for all test data)
Test Scenario 1:partition:20 worker:10 spout:20 bolt:1 calculation results:
Test Scenario 2:partition:20 worker:20 spout:20 bolt:1 test results:

Scenario 3: (Data generator is executed on 128-132, each program 100 out-of-the-box, resources also have a certain occupancy, so the actual results may be better than the test results) Topic 5 Partition Spout Worker Bolt 1 Test results:

Summary results:5 x topic,20 partition,20*5 worker,20*5 spout,1*5 bolts Total Throughput =5.04+4.02+5.76+6.31+4.99=26.12
Throughput per second is 261,200 daily throughput of nearly 22.6 billion Summary:The factors that affect storm throughput are the following: spout concurrency, number of workers (linked to slots), Kafka partition number of spout and Kafka partition. It is important to note that increasing the number of workers can increase throughput, but be aware that the number of workers is tied to the number of machines in the cluster and is limited. So you need to set a value that you think is reasonable by testing, because if one task sets too many workers, the fewer workers are left for other tasks, and the fewer tasks you will run. So as long as the business needs to meet the value of the best, the specific test results see Attachment; reprint please specify Source address: http://blog.csdn.net/weijonathan/article/details/38536671

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.