Recently, a project uses the open-source key-value storage redis server, which was proposed by colleagues.
One master is deployed and several slave instances are deployed. Slave and master are synchronized.
After development, go online. The next day, we found that slave and master were disconnected. Master and slave are deployed in different data centers. The processes are still there, but the data is not synchronized and the data is not updated, because the master shows that the number of slave connections is less.
This is too scary. It won't be easy to disconnect and re-connect without implementation, redis.
Further analysis: If the slave is not connected to the master at the beginning, the connection will be retried. If the master process is restarted or killed in the middle, slave will try again.
That's strange.
Later I found that at the redis master end, I used netstat to view the TCP connection. The connection to slave was lost and closed. Check the TCP connection on the server Load balancer and check that the connection to the master is in the state of established. That is to say, the master node detects that the TCP connection is disconnected, so the log shows that a Server Load balancer instance is missing. For some reason, the Server Load balancer instance disconnects the TCP connection and cannot be used, but the system does not detect... the reconnection was initiated by slave because the connection was broken, so there was no reconnection.
I will propose two solutions:
First, redis implements heartbeat mechanism or ping at the application layer. Ping once in the interval. Several Ping times out, and the master and slave connections are considered disconnected.
Second, the keepalive mechanism of TCP is used to constantly refresh the TCP connection. It can also detect whether the connection is actually disconnected and the established is still in use.
Later, my colleague found a patch on the redis webpage.
Http://code.google.com/p/redis/issues/detail? Id = 224
It is solved by setting TCP keepalive for the socket.
This patch already exists in the latest 2.0 code, but the patch has not been downloaded a few days ago.