EPGM a problem where the address cannot be bound at the same time in a container of different mirrors

Source: Internet
Author: User
Tags function prototype

Problem Background

Two different mirrors, recorded as a,b mirrors, a mirror generates two containers a1,a2,b mirror generation b1,b2. Communication is done using the ZMQ component in a epgm manner, requiring an address binding first. Indicates that the service joins a multicast address and subscribes to the multicast message.

Problem performance:

1, directly in the A1,A2 run this piece of code, no problem. Can successfully bind a multicast address.

2, directly in the B1,B2 run this piece of code, but also no problem, can successfully bind the multicast address.

3, directly on the A1 run this code, but also on the B1 run, after the start will error exit, the corresponding error is the port binding failure, is occupied. The errno is 98, which means the address is occupied. Preliminary analysis of the problem

Go ahead and get the first piece of code

#include <zmq.hpp> #include <iostream> #include <sstream> #include <exception> #include <
stdio.h> #include <stdlib.h> #include <string.h> usingnamespace std;
        Intmain (INTARGC, char*argv[]) {try{zmq::context_t context (1);
        zmq::socket_t Subscriber (context, zmq_sub);
        Intrate = 10000;
        Subscriber.setsockopt (zmq_rate, &rate, sizeof (RATE));
        Subscriber.setsockopt (Zmq_subscribe, "", 0);
        INTHWM = 0;
        Subscriber.setsockopt (ZMQ_RCVHWM, &AMP;HWM, sizeof (HWM));
        This means adding a multicast, subscribing to the multicast Message subscriber.connect ("epgm://127.0.0.1;239.192.1.1:5555");
        INTUPDATE_NBR; for (;;)
            {zmq::message_t update;
            cout << "before recv" << Endl;
            SUBSCRIBER.RECV (&update);
            cout << "recv data =" << (char*) update.data () << Endl;
          cout << "Data size =" << update.size () << Endl;  Std::cout << Std::endl << Std::endl;;
    }}catch (exception& e) {cerr << "exception occur:" << e.what () << Endl;
} Return0; }


Environment Description:

1, a mirrored OS is CentOS release 6.6, gcc version 4.4.7.

2, B mirrored OS is Debian Gnu/linux 7, g++ (Debian 4.7.2-5) 4.7.2

3, installed LIBPGM version for LIBPGM-5.1.118~DFSG, ZMQ version for zeromq-4.2.1

4, the container starts the network configuration to use is the host mode.


Possible direction of suspicion:

1, because it is running on the container, so first of all to analyze the network model is not a problem. After the investigation and find information, host mode does not exist any problems.

2, because the bottom of the Docker container is not understood, it is suspected that the same mirror between the different containers, is not a certain special relationship. Online search information, no fruit.

3, or is not because the different containers are different OS, so can not do so. Theoretically, it won't work.

4, think so much also useless, directly to the code inside to see, finally found that the failure is in a system call bind. The function prototype is int bind (int socket, const struct SOCKADDR *address,socklen_t Address_len);

Think of the different parameters that are sent in to cause the binding to fail. Then in GDB to follow up, one can successfully repeat the binding, the corresponding parameters, and a failure to repeat the binding, is not a difference, and then based on these differences to find the problem.

In-depth analysis of the problem

It seems that the problem has been jammed and cannot go down any further. At this time, there is only one way, source. Fortunately with the LIBPGM we have the source code, you can control the source to analyze the problem. Just calm down and get a clue.

Back to some of the original basic concepts.

Multicast, Linux,lo loopback address, the "UNIX network Programming" turned out, a new idea.

Re-use GDB to follow the basic logic, coupled with its own interpretation of the source code, to understand the nature of the problem:

Our binding on the local loop address of 127.0.0.1 itself is not supported by multicast, EPGM is simulated and how it is simulated, it is directly to the Inadd_any to bind a corresponding port . For example: That is we on LO if binding a EPGM address to

epgm://127.0.0.1;239.192.1.1:5555, then it is directly on the local binding to the 0.0.0.0:5555 above.

By netstat, you can verify your own phase method. As shown in the following illustration:

Then the question can have a simple idea, try to verify, between A1 and B1, whether can do 0.0.0.0:5555 such a binding

Theoretical conjecture verification

Or the previous code to verify that UDP ports can be repeatedly bundled between different containers in different mirrors.

#include <iostream> #include <stdio.h> #include <stdlib.h> #include <errno.h> #include < string.h> #include <sys/types.h> #include <netinet/in.h> #include <sys/socket.h> #include <sys
/wait.h> #include <unistd.h> #include <arpa/inet.h> #include <fcntl.h> externinterrno;
USINGNAMESPACESTD;
    Intmain (intargc,char** argv) {Intufd_listener = socket (pf_inet, SOCK_DGRAM, 0);
    intopt = 1; if (setsockopt (ufd_listener,sol_socket,so_reuseaddr,&opt,sizeof (opt) < 0) {Cerr << "error set So_reu
        Seaddr "<< Endl;
    return-1;
    } structsockaddr_in my_addr;
    Bzero (&my_addr,sizeof (MY_ADDR));
    my_addr.sin_family = af_inet;
    My_addr.sin_port = htons (4000);
    MY_ADDR.SIN_ADDR.S_ADDR = Inaddr_any; if (Bind (Ufd_listener, (STRUCTSOCKADDR *) &my_addr, sizeof (structsockaddr)) = = 1) {cerr << "bind error
    , errno = "<< errno<< Endl;
} do{        Sleep (1);
    cout << "Sleep 1" << Endl;
    }while (TRUE);
Return0; }


The conclusion is that the same address can be bound between different containers of different mirrors.

the use of key technical points is SO_REUSEADDR, specific can carry out Baidu find relevant technical explanation. continue to look for problems

In the previous step, it has been proved that different mirrors in different containers, can be bound to the same port, this path theoretically can be made.

Then return to the source code, the EPGM of the library to open, and then around the problem to read.

Directly try to force the corresponding socket value to SO_REUSEADDR before the bind system call, and then reinstall LIBPGM and ZMQ in two containers a1,b1.

Magical things happen, A1 and B1 can run the code 1 at the same time. The truth is revealed

It is now possible to solve the problem of having the same multicast address on Lo bound by different containers with two different mirrors, with 1 questions left:

1. If there is no so_reuseaddr between A1 and A2, how can they bind together successfully?

Or are you going to go through the code again, and in the key place, you've found another setting that might be a possible coexistence attribute value, So_reuseport, which was added after the Linux kernel 3.9. Write code verification immediately, if only set So_reuseport, is not also can coexist. The conclusion is indeed so.

That further suspected, would not be two mirrors, because the OS version is different, resulting in a set of So_reuseport, another set of so_reuseport, they are able to bind the same address, but can not repeat the binding.

That is: a mirror setting is that SO_REUSEPORT,A1,A2 can bind to the same address. b mirroring is set to SO_REUSEADDR,B1,B2 can be bound to the same address. But A1 and B1 cannot bind to the same address.

Verification: In LIBPGM's socket.c, find the following code,

and use getsockopt, before bind, respectively in two containers, a1,b1 to obtain the SO_REUSEADDR and So_reuseport signs. Get the final truth.

in a mirror, because it is the newer kernel, he defines the so_reuseport, then he has only so_reuseport been set up.

in B-Mirror, he has no definition of so_reuseport because of the possible comparison of the old kernel, then he has only so_reuseaddr been set up.

for the difference between So_reuseport and so_reuseaddr, you can refer to this link http://blog.qiusuo.im/blog/2014/09/14/linux-so-reuseport/

So it explains why A1,A2 can bind the same, B1,B2 can bind to the same address, but A1,B1 cannot bind to the same address.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.