Background
Previously, the service was created on different servers (Docker host) using Docker swam, and the container between them was communicated through overlay network. Yesterday due to the company's network maintenance, one of the servers (we referred to as manager node) due to maintenance, temporarily unable to connect (probably lasted 6 hours). Come back today, we found that the communication between the container a problem ...
Analyze problems
1. First, from the physical machine and the network level, check the network connection between the two servers, found that there is no problem.
2. Enter (work node) and find that the container inside is unable to connect to the container above (Manager node).
3. Recreate the overlay network between node and build containers in it (I'm experimenting with busybox here) and find it impossible to communicate with each other.
Resolve Issue 1. Worker node re-joins Swarm
Join --token swmtkn-1-23xxxxxxxxxxxxxxxxxxxxxxxxx
2. Restart container
Docker Restart <container-name>
3. Enter the container test network connection
#nslookup Managerbusybox
Discovery can find a solution to the communication problem between the specified Container,container!
Recall shell script start service, OK, everything is back to normal:)
PS. There is a strange place where you use commands to view swarm node:
the returned nodes are active, but in fact, there has been a problem with each other's network communication ... It's a little confusing, and I don't know if it's a bug →_→
Hope to have a clear child shoes can give an analysis, in this thanked!!
Remember once Docker swarm-overlay network access error