Jaeger Source analysis--Service discovery and registration

Source: Internet
Author: User
Tags cassandra docker run
This is a creation in Article, where the information may have evolved or changed.

Original: Jaeger Source analysis--Service discovery and registration

Statement

Jaeger official does not specify its service registration and service discovery specific use and introduction, this part of the function is in the analysis of the source code, found that its principle and service registration and service discovery similar, so combined with their own knowledge of service registration and service discovery, do a summary , please advise me of mistakes.

TChannel Service registration and service discovery

Jaeger can also implement service registration and service discovery without the help of third-party tools, which is provided by the RPC framework on which it relies.

Third party registration--manual registration

go run cmd/agent/main.go --collector.host-port=192.168.0.10:14267,192.168.0.11:14267

When you start the agent, you can configure multiple collector static addresses, which form a single registry.

Registration Form

    • Registry structure
github.com/uber/tchannel-go/peer.go #59type PeerList struct {    sync.RWMutex    parent          *RootPeerList    //以hostPort为下标组成注册表    peersByHostPort map[string]*peerScore    //负载均衡实现    peerHeap        *peerHeap    scoreCalculator ScoreCalculator    lastSelected    uint64}
    • Health Check
Github.com/jaegertracing/jaeger/pkg/discovery/peerlistmgr/peer_list_mgr.go #150func (M *peerlistmanager) Ensureconnections () {peers: = M.peers.copy () minpeers: = M.getminpeers (peers) numconnected, notconnected: = m.fi Ndconnected (peers)//If there are 3 active services, there will be no health check if numconnected >= minpeers {return} ... for I: = Ran GE notconnected {//swap current peer with random from the remaining positions r: = i + m.rnd.intn (len (notconne CTED)-i) notconnected[i], notconnected[r] = Notconnected[r], notconnected[i]//try to connect to the current peer ( Swapped) Peer: = Notconnected[i] M.logger.info ("Trying to connect to peer", Zap. String ("Host:port", peer. Hostport ()))//For controlling timeout ctx, Cancel: = context. Withtimeout (context. Background (), m.connchecktimeout) conn, err: = Peer. Getconnection (CTX) cancel () if err! = Nil {m.logger.error ("Unable to connect", Zap. String ("Host:port", peer. Hostport ()), Zap. Duration ("ConnchEcktimeout ", M.connchecktimeout), Zap. Error (ERR)) Continue} ...}

On the registry address, TChannel will perform a health check, once per second, if 0.25 seconds are not connected, as the service is not available. If the connection succeeds, the current service instance is retained for the agent to submit the data for use.

github.com/uber/tchannel-go/connection.go #228func (ch *Channel) newOutboundConnection(timeout time.Duration, hostPort string, events connectionEvents) (*Connection, error) {    conn, err := net.DialTimeout("tcp", hostPort, timeout)    if err != nil {      if ne, ok := err.(net.Error); ok && ne.Timeout() {        ch.log.WithFields(LogField{"hostPort", hostPort}, LogField{"timeout", timeout}).Infof("Outbound net.Dial timed out")        err = ErrTimeout      }      return nil, err    }    return ch.newConnection(conn, hostPort, connectionWaitingToSendInitReq, events), nil}

Client Service Discovery

    • Soft Load Balancing
github.com/uber/tchannel-go/peer.go #149func (l *PeerList) choosePeer(prevSelected map[string]struct{}, avoidHost bool) *Peer {    var psPopList []*peerScore    var ps *peerScore    ......    size := l.peerHeap.Len()    for i := 0; i < size; i++ {      //把peer从Heap头部弹出来      popped := l.peerHeap.popPeer()      if canChoosePeer(popped.HostPort()) {          ps = popped          break      }      psPopList = append(psPopList, popped)    }    //不符合的放入Heap尾部    for _, p := range psPopList {        heap.Push(l.peerHeap, p)    }    if ps == nil {        return nil    }    //符合条件的打分,再放入Heap尾部    l.peerHeap.pushPeer(ps)    ps.chosenCount.Inc()    return ps.Peer}

When the agent needs to submit data, the peer (service information) is obtained from the TChannel load balancer, and when there are multiple, TChannel queries peer by polling. Implementation: The registry puts all the peers into the peerheap, pops the peer from the head, and then puts the peer back to the tail, thus enabling the load balancing of the polling strategy.

    • Retry
 github.com/uber/tchannel-go/retry.go #212func (ch *channel) Runwithretry (runctx context.    Context, f Retriablefunc) error {var err error opts: = Getretryoptions (runctx) rs: = Ch.getrequeststate (opts) Defer Requeststatepool.put (RS)//default retry 5 times for I: = 0; I < opts. maxattempts; i++ {Rs. attempt++ if opts. Timeoutperattempt = = 0 {err = f (Runctx, RS)} else {attemptctx, Cancel: = context. Withtimeout (Runctx, opts. TIMEOUTPERATTEMPT) Err = f (attemptctx, RS) cancel ()} If Err = = Nil {retur N Nil} if!opts. Retryon.canretry (Err) {if ch.log.Enabled (loglevelinfo) {ch.log.WithFields (Errfield (Err)).            Info ("Failed after non-retriable error.")    } Return err} ...} Too many retries, return the last error return err}  

Communication between networks avoids network anomalies, so in order to improve usability, retry is one of the ways. When the peer submits data from the load balancer to collector, if the commit fails, the peer is then fetched from the load balancer, up to 5 times, and if 5 times is unsuccessful the submission is discarded.

Consul+docker Service registration and service discovery

Using consul to implement service registration and service discovery is a simple matter. Many features are available out of the box.

Preparatory work

    • Start consul--ip:172.18.0.2
docker run -itd --network=backend \-p 8400:8400 -p 8500:8500 -p 8600:53/udp \-h node1 progrium/consul -server -bootstrap -ui-dir /ui
    • Start Agent
docker run \-itd --network=backend \--name=jaeger-agent \-p5775:5775/udp \-p6831:6831/udp \-p6832:6832/udp \-p5778:5778/tcp \--dns-search="service.consul" --dns=172.18.0.2 \jaegertracing/jaeger-agent \/go/bin/agent-linux --collector.host-port=jaeger-collector:14267
    • Start collector
#node1docker run -itd --network=backend \--name=jaeger-collector-node1 \-p :14267 \--dns-search="service.consul" --dns=172.18.0.2 \jaegertracing/jaeger-collector \/go/bin/collector-linux \--span-storage.type=cassandra \--cassandra.keyspace=jaeger_v1_dc \--cassandra.servers=cassandra:9042#node2docker run -itd --network=backend \--name=jaeger-collector-node2 \-p :14267 \--dns-search="service.consul" --dns=172.18.0.2 \jaegertracing/jaeger-collector \/go/bin/collector-linux \--span-storage.type=cassandra \--cassandra.keyspace=jaeger_v1_dc \--cassandra.servers=cassandra:9042

Service Registration-Automatic registration

docker run -itd --net=backend --name=registrator \--volume=/var/run/docker.sock:/tmp/docker.sock \gliderlabs/registrator:latest \consul://172.18.0.2:8500

Using the Consul+docker form, as long as the deployment of a good service, it will be automatically registered to consul, very simple.

Registration Form

    • Viewing registry information

View registry Information Http://localhost:8500/ui/#/dc1/nodes/node1

You can see that the 2 Collector service IPs that were started are: 172.18.0.5 and 172.18.0.8

    • Health Check

Consul offers a variety of health checks: HTTP, TCP, Docker, Shell, and TTL. Details can be found on the official website.

Service-side Service discovery

Consul is a remote service relative to agents and collector, so there are 2 ways to discover Services: HTTP and DNS, where the primary use is DNS because it is simple and lightweight.

    • DNS and soft load balancing

When the agent resolves multiple IPs through DNS, Consul randomly chooses an IP to load balance the agent.

Due to the existence of DNS cache, it is possible that the service is unhealthy, the same will be normal parsing, so by default consul is not set cache time, TTL is 0, but also take into account the pressure does not cache to consul, so open configuration, let us decide to cache point-in-time DNS Caching.

Summarize

Both TChannel and Consul+docker implement service discovery and service registration with their pros and cons:

Service Registration

    • TChannel

TChannel Service registration applies To some basic services, such as Jaeger, which is a basic service that is rarely changed once deployed.

    • Consul + Docker

It's a lot easier to register a service with Consul in today's popular Docker environment, and Docker has a feature that IP addresses are dynamic, so it's a good fit for business scenarios because the business is constantly changing and the service varies.

Health Check

Both TChannel and consul provide health checks, but they are all just testing whether the service is running and not knowing if the request will be processed properly.

Service discovery

    • TChannel

TChannel uses client service discovery, which has the advantage of discovering the service-side service of consul without the remote network overhead, a single point of issue. At the same time, the disadvantage is that each language needs to implement its own registry, load balancing and other functions.

    • Consul

Consul uses service-side services to discover that it can be used in conjunction with other services, without the need for relational registries, load balancing, and so on. It also provides scenarios for both network overhead and single point issues.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.