Troubleshoot slow connection of kafka producer once
Symptom:
Kafka producer connects to the kafka broker through SSL to send messages.
The message can be sent successfully, but the connection is very slow. It takes nearly 50 seconds to send a message.
Environment:
Kafka broker is located in the data center and exposed to the public network through port ing.
Intranet IP Address: 10.1.1.1
Public IP: x. x (MAP port 9093 to 10.1.1.1: 9093 on the Intranet)
Intranet access brokers use PLAINTEXT, while Internet access brokers use SSL.
Broker configuration (Network-related configuration only)
- Ssl. keystore. location = server. keystore. jks
- Ssl. keystore. password = xxx
- Ssl. key. password = xxx
- Ssl. truststore. location = server. truststore. jks
- Ssl. truststore. password = xxx
- Ssl. client. auth = required
- Listeners = PLAINTEXT: // 0.0.0.0: 9092, SSL: //: 9093
- Advertised. listeners = PLAINTEXT: // 10.1.1.1: 9092, SSL: // x. x: 9093
Producer is on the internet and accessed through SSL. Configuration:
- Bootstrap. servers = x. x: 9093
- Ssl. protocol = SSL
- Security. protocol = SSL
- Ssl. keystore. location = client. keystore. jks
- Ssl. keystore. password = xxx
- Ssl. key. password = xxx
- Ssl. truststore. location = client. truststore. jks
- Ssl. truststore. password = xxx
Producer log:
- 14:01:23. 367 [main] INFO org. apache. kafka. common. utils. AppInfoParser-Kafka version: 0.9.0.1
- 14:01:23. 367 [main] INFO org. apache. kafka. common. utils. AppInfoParser-Kafka commitId: 23c69d62a0cabf06
- 14:01:44. 856 [main] INFO org. apache. kafka. clients. producer. KafkaProducer-Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms.
Troubleshooting:
On the producer side, use tcpdump to observe port 9093. It is found that the SSL handshake is successful and data transmission also exists,
However, the interval between startup and sending of syn from producer is more than 10 seconds.
Then, we removed the filter for port 9093, observed all ports, and found that there was a name query on port 137,
It took more than 10 seconds to send the message three times.
Port 137 is the NETBIOS protocol of windows. It seems that the producer is querying host information from the broker.
Then, you can view the thread stack of the producer through jstack and find that you are waiting at getHostByAddr.
At this time, I wonder if the broker's host name cannot be obtained.
After the host name "x. x serverx" is configured in the hosts file on the producer side, the fault is resolved,
The producer stops sending messages very quickly.
Summary:
This fault occurs because the client cannot obtain the host name based on the IP address of the server and waits until it times out, resulting in slow processing.
Finally, it is solved by configuring hosts.
(If you use DNS, You need to configure reverse resolution .)
The broker ip address is directly set in the producer configuration. Why does the client need to call getHostByAddr?
I have not figured out this problem. I guess it is because the SSL protocol needs to verify the server certificate, and the server domain name and
Does the certificate compare with cn?