Difficulties in Kafka performance optimization (2); kafka Performance Optimization
Last article: http://blog.csdn.net/zhu_0416/article/details/79102010
Digress:
In the previous article, I briefly explained my basic understanding of kafka and how to use librdkafka in c ++ to meet our own business needs. This article is intended to study some alternative methods. It has nothing to do with the Code and the technology used is not new, but I am surprised that no one has used it like this, practice...
I am quite uncomfortable. There are so many common and practical methods that can improve the performance by several times through simple operations that few people even have heard. I suddenly felt that in this IT environment in China, how can we create great programmers? The company focuses on interests, the leadership focuses on results, and those who do not know the technology as leaders, people with technical knowledge are led, and interviews depend on mouth opening. work depends on meetings. promotions depend on qualifications. Job Seeking relies on education...
For a project of 99%, "Get a version first, and then consider optimizing later" "this requirement is very simple. How can we achieve it? I will do it tomorrow", however .. There is no time to sort out and think. Projects are always in a hurry, and programmers are always working overtime... Previous Code always depends on the next bug...
Let's get back to the question.
1. Establish the Kafka Environment
There are a lot of tutorial examples for building environments on the Internet. After reading it, almost all of them are the same. They are all transferred to each other, without considering the actual application scenarios or the performance, I don't even want to figure out why we should do this. Everyone is following this routine, and I can't even find a topic about how to build five node clusters.
In the actual production environment, kafka is built on the server. We all know that each server builds a node and multiple nodes form a cluster. We also know that the performance bottleneck of kafka is network I/O and disk read/write speed.
Under normal circumstances, it is okay to set up such a message, and kafka is designed for short messages. However, in most application scenarios, We have to transmit large messages such as images in real time, suppose we want to build a cluster that supports a daily throughput of 40 million data, each message is 1 MB,
Let's calculate:
Assume that the bandwidth of each node is 1000 Mbps, and the number of bytes that can be transferred is 1000/8 = 125 M. If you want to support processing 500 data records per second, what cluster does it need?
Okay, there are 500*1 M = 500 M, 500 M/125 M = 4. That is, the Cluster with four nodes. No problem, old tie?
No problem? A big fault. The m is only the bandwidth of the data throughput, but the throughput can be swallowed or vomited.
The simplest one-copy production and one consumption, each of which requires half of the bandwidth, then we need to produce 500 million lines of consumption per second, at least 500 mb of bandwidth.
This is only the ideal environment. In actual application, more than one user needs to consume data from kafka. Assume that three users need to consume data from the cluster and support 500 entries per second, this means that the total throughput of my cluster is 20000 MB per second, and the network I/O bit rate is Mbps per second. If each node uses a gigabit Nic, that means I need a cluster consisting of 20 servers. A server is conservatively configured with 50 thousand RMB for one instance, which requires 1 million RMB.
This is obviously not the case. Isn't kafka the first in the throughput universe? How can we rely on the number of servers to meet the demand for processing only 500 pieces of data per second?
There must be a problem. I have to go to the official website to ask for an explanation. This is totally different from what you have blown out. How can I help you?
Let's look back and think about the problem. Where is the problem? We have no problem with the computation. This is the most ideal case, and the actual production environment may be worse. No, there must be a problem.
I have tried and tried again and again. It's really okay. That's what we use in our company. That's what we use in our company. Isn't it wrong?
No problem? A big fault. Through the above calculation, we will obviously draw a strong and Officially verifiable conclusion that only the network bandwidth is insufficient to limit the kafka performance. Is there a solution? For 10 Gbps bandwidth? The cost is doubled, and the cost is 2 million RMB.
Okay, the next step is how we can solve this network bottleneck:
Since our bottleneck is on the network and the network bottleneck is on the network card, it is unrealistic to change the gigabit network card to the 10-Gigabit network card, there is only one path left, and multiple network cards are added. OK. Generally, the server supports four Network Ports and one management port. First, plug in four network cables and configure four IP addresses. No problem. Then we thought about it. Since there are four NICs and each Nic has an independent IP address, can we build four kafka nodes on one machine, i'm so excited to bind a network card to each node... Follow the tutorial, step by step, friction, biu one .... Nana ?? Why ?? Is there a problem with my configuration? Check it again and give it to me... Check twice. Give it to me... Check three times... Baidu Google, checking four times... What's the ghost ??? Does kafka support opening multiple processes on one machine? No, there is no tutorial on building a pseudo-cluster on a machine. It is not a problem with kafka. Then there was another query, and a random trial finally .. We have tested the truth: the port numbers must be configured for multiple kafka instances on the same machine, and the new version of kafka gradually discards the host. name and port are the two configuration items. You only need to configure listeners for all related configurations.
Delete all the items and try again. Configure the NIC ip address and listeners. Give it to me! Hahaha, I finally got up and can produce and consume data normally. I am so powerful that I have solved such a difficult problem. after so much effort, finally, we can realize the great wish of a network card corresponding to a kafka node. Who else ?? However... As the saying goes... Don't be happy too early... Too early .. Early ..
Come on, let's see if your network I/O bottleneck can reach Mbps. Input a string of code quickly: sar-n DEV 1
Dangdang .. As the number of production threads increases, the network I/O quickly reaches Mbps. I need to add 10 more production threads! Sar-n DEV 1 Dangdang... What? I/O on the network is still 1000 Mbps. No, no, why? Capture a packet and check that the four ip addresses have established tcp links and started to interact with data. No problem. Let's take a closer look. Why do all ip addresses point to the same mac address, it's not scientific. Are you kidding me? I don't believe it. It must have been my configuration error. I am not very sincere enough. It must have been my startup method... I don't believe it, I don't believe it... I don't believe it... Let's try again .. Twice... Three times...
Find a bag with sacks. All network card ip addresses point to a mac address. Do you need to manually configure the route information of the network card so that each network card can forward data and query information through its own route, read the tutorial, read the man manual, and configure the route .. Let's try again... Try twice... Try three times... No longer .. I want to go home, I miss my mother...
Let's calm down and think about it again: what we can confirm now is that it is feasible to build a cluster on a machine. You just need to configure the port number to be different, it is not feasible to bind a network card to each kafka node. Even if the socket is bound to the ip address of a specific network card, the data packet first passes through the route table when it leaves the host, the route table will find the lowest-cost network interface (any static interface) for sending. The four NICs we configured have the same cost, because the four NICs are in the same subnet (that is, the same network segment), the transfer rate does not exceed the transmission rate of a single Nic. To solve this problem, you can manually configure the route table information, ensure that the ip addresses of the four NICs are located in different network segments, and ensure that different network segments can be connected.
Well, in actual application, we are the one with the assigned ip address, not the one with any network segment allocation. Obviously, this method is not feasible, but at least we have come up with a feasible solution, isn't it.
We spent a lot of time studying the relationship between kafka and the network card, but suddenly looked back and found that we had a big bend without knowing it. In the final analysis, we want to solve the network bandwidth problem. Instead, we can link ourselves to kafka. Since we can build a pseudo cluster on a machine, so why not bind all the network cards of this machine?
Starting from Centos7, we use team mode and link aggregation to bind multiple NICs. Let's try it:
For more information, see the official documentation:
Https://access.redhat.com/documentation/zh-CN/Red_Hat_Enterprise_Linux/7/html/Networking_Guide/index.html
It should be clear that no static IP address should be configured for the bound Nic. Before binding, all NICs must be restored to the initialization status. One server can only have one gateway.
What we need is to increase the binding of the bandwidth mode. For other modes, Please study it and configure it through NetworkManager:
1. Create team1 and select the mode:
Command: nmcli connection add con-name team1 type team ifname team1 config '{"device": "team1", "runner": {"name": "loadbalance ", "tx_hash": ["eth", "ipv4", "ipv6"], "tx_balancer": {"name": "basic "}}}'
2. Add a NIC for binding (four NICs in the local machine)
Command: nmcli connection add con-name team1-port1 type team-slave ifname enp2s0f0 master team1
Nmcli connection add con-name team1-port2 type team-slave ifname enp2s0f1 master team1
Nmcli connection add con-name team1-port3 type team-slave ifname enp2s0f2 master team1
Nmcli connection add con-name team1-port4 type team-slave ifname enp2s0f3 master team1
3. Set the IP address and gateway for the bound virtual network card
Command: nmcli connection modify team1 ipv4.addresses 192.000025.100/24 ipv4.gateway 192.000025.254 ipv4.method manual
Note: ipv4.addresses 192.255.25.100/24 indicates the IP address and subnet mask of the four NICs that are aggregated into one Nic.
Ipv4.gateway 192.255.25.254 here is the NIC gateway configuration.
4. Start team1
Command: nmcli connection up team1
5. Restart the network
Command: systemctl restart network
6. view the status
Command: teamdctl team1 state
Note: Four NICs are displayed.
7. List team1 ports
Command: teamnl team1 ports
Note: Four NICs are displayed.
Other operations: nmcli device disconnect enp2s0f0 (disable one of them)
Nmcli device connect enp2s0f0 (one of which is enabled)
Ip link set down enp2s0f0 (disable one of them for testing)
8. View Networks
Command: ip add
Note: team1 information (such as ip address and gateway) is displayed here ).
Now, multi-nic link aggregation is configured.
We are looking forward to building a kafka pseudo cluster on top of it to start testing... Start 10 production!
Dangdang... Network I/O reaches 1000 Mbps, 2000Mbs, 3000 Mbps, 3600Mbps... Oh, my God, it's a success... And the bandwidth loss rate should not exceed 10%...
We have grown up in a lucky age, and the cost of learning and progress is so small,
We have grown into an unfortunate age, and the cost of learning and progress is so high.