Jaeger source analysis--peering distributed system implementation

Source: Internet
Author: User
Tags cassandra compact
This is a creation in Article, where the information may have evolved or changed.

Original address: Jaeger Source analysis--Peep Distributed system implementation

Objective

Analysis Jaeger source code mainly for the following reasons:

    • The company is using Jaeger, through the understanding of its source code, you can better control the system.

    • Understanding the design of distributed systems

    • Improve the understanding of Golang

    • Improve your English

The analyzed version is the latest version of 0.10.0, time 2017-11-23

Agent--3

Agnet is in between Jaeger-client and collector, belonging to the role of agents, mainly the data sent by the client from thrift to batch, and through the RPC batch submitted to collector.

Initialize Agent

Github.com/jaegertracing/jaeger/cmd/agent/app/flags.go #35

var defaultProcessors = []struct {    model    model    protocol protocol    hostPort string}{    {model: "zipkin", protocol: "compact", hostPort: ":5775"},    {model: "jaeger", protocol: "compact", hostPort: ":6831"},    {model: "jaeger", protocol: "binary", hostPort: ":6832"},}
    • 3 UDP services initialized when agent is turned on

    • Each service corresponds to a different data format

    • 6831 ports are officially recommended to receive data

Receive data from Jaeger-client

Github.com/jaegertracing/jaeger/cmd/agent/app/servers/tbuffered_server.go #80

func (s *TBufferedServer) Serve() {    atomic.StoreUint32(&s.serving, 1)    for s.IsServing() {        readBuf := s.readBufPool.Get().(*ReadBuf)        n, err := s.transport.Read(readBuf.bytes)        if err == nil {            readBuf.n = n            s.metrics.PacketSize.Update(int64(n))            select {            case s.dataChan <- readBuf:                s.metrics.PacketsProcessed.Inc(1)                s.updateQueueSize(1)            default:               //这里需要注意,如果写比处理快,agent将会扔掉超出的部分数据               s.metrics.PacketsDropped.Inc(1)            }        } else {            s.metrics.ReadError.Inc(1)        }    }}

Each UDP server has its own separate queue and worker, each queue (the default length of 1000) will have 50 (co-formed) worker consumption queue data, and can adjust the queue and worker size according to the system load.

    • Increase Queue Length (default)--processor.jaeger-compact.server-queue-size

    • Increase worker count (default)--processor.jaeger-compact.workers

Graceful Close

Go Initializes a service that is simple and can be implemented in the form of for{}. But when you start, you have to think about how to shut down, can't you just force it off? The request processing half is interrupted, causes the dirty data to appear, obviously is not the result which we want, therefore has graceful closes the way. Graceful shutdown is achieved by basically receiving a signal from the primary service and then notifying the child service that it will not perform the current operation.
Let's take a look at the implementation of the NSQ and Jaeger notification sub-service stops:

    • NSQ

Github.com/nsqio/nsq/nsqd/topic.go #215

 func (t *Topic) messagePump() {    ......    for {        select {        case msg = <-memoryMsgChan:    ......        case <-t.exitChan:            goto exit        }    ......    }    exit:    t.ctx.nsqd.logf("TOPIC(%s): closing ... messagePump", t.name)}
    • Jaeger

func (s *TBufferedServer) Serve() {    atomic.StoreUint32(&s.serving, 1)    for s.IsServing() {        ......    }}

On the implementation that notifies the child service to stop executing, both NSQ and Jaeger are leaving a portal where the primary service notifies the child service. The difference is in stopping this step:

    • NSQ uses Chan+goto,exitchan to receive a signal, executes a goto, and jumps out for the for loop.

    • Jaeger uses atomic manipulation to set S.serving to 0 by atomic operation, jumping out for a for loop.

Temporary Object Pool

Online blog on the temporary object pool introduced quite detailed "go concurrent programming Combat"--temporary object pool. The purpose of a temporary object pool is to store reusable values and reduce garbage collection.
To play the role of the object pool, first make sure that the pool is not empty. If you get a value from an empty pool, only the new value is re-used, not the effect of multiplexing. So the general usage is first get, then put.

readBuf := s.readBufPool.Get().(*ReadBuf)

As you can see from the above code, the agent wants to reuse "*READBUF" through the object pool, but does not see the put step, because this step is handled by the worker.

Github.com/uber/jaeger/cmd/agent/app/servers/tbuffered_server.go #124

func (s *TBufferedServer) DataRecd(buf *ReadBuf) {    s.updateQueueSize(-1)    s.readBufPool.Put(buf)}

Why not put "readbuf" into the pool when the data is put into the queue? This is determined by the current scenario. First "readbuf" is a pointer, the second pointer will be placed in Chan, in this case, if Chan has a data accumulation (worker processing queue data), when the agent receives the client data, because of the reuse of "*readbuf", has caused all of Chan's data and new data to be the same as the problem of confusion, examples. So to reuse a value, put it back to the pool only after the worker consumes the queue data.
After viewing the object pool usage of the agent, take a look at the object pool used by NSQ.

Github.com/nsqio/nsq/nsqd/topic.go #197

func (t *Topic) put(m *Message) error {    select {    case t.memoryMsgChan <- m:    default:        b := bufferPoolGet()        //b => bp.Get().(*bytes.Buffer)        err := writeMessageToBackend(b, m, t.backend)        bufferPoolPut(b)        ......    }    return nil}

Here the use is relatively easy to understand points, first get "bytes." Buffer ", then processing the M data, and finally the"bytes. Buffer "put into the pool. Unlike agents, Writemessagetobackend does not accumulate data and data is garbled. There is also a small detail when putting "*bytes." Buffer "Put into the pool and get out, B will also retain the last processed data, so NSQ will clear the data, using a clean value."

No Mercy data

The agent's service queue has a length limit (default 1000), and if it accumulates more than 1000, the agent will discard the data without mercy. Of course here is not inappropriate, Jaeger positioning is a set of log system, not very important data reliability. If you want to reduce the problem of data loss, you can configure or increase the agent node. Because the Jaeger and NSQ are not the same for the data, they are not compared to this part of the function. NSQ more attention to the reliability of data.

Submit data

Github.com/jaegertracing/jaeger/cmd/agent/app/processors/thrift_processor.go #104

func (s *ThriftProcessor) processBuffer() {    for readBuf := range s.server.DataChan() {        protocol := s.protocolPool.Get().(thrift.TProtocol)        protocol.Transport().Write(readBuf.GetBytes())        //这步就是把“*ReadBuf”Put到池子        s.server.DataRecd(readBuf) // acknowledge receipt and release the buffer            //将数据从thrift解析成Batch并提交        if ok, _ := s.handler.Process(protocol, protocol); !ok {            // TODO log the error            s.metrics.HandlerProcessError.Inc(1)        }        s.protocolPool.Put(protocol)    }    s.processing.Done()}
    • Consuming queue data

Here is the implementation of a worker that initializes 150 workers to process the queue data when the agent is started. The consumption queue uses the For + range method, not the use of Select + Chan, the introduction of these 2 methods can be seen in Go channel--range and select. Here the agent is lazy, without regard to graceful shutdown, if the queue accumulates data, and the agent is restarted queue data will be lost.

    • Data converted from thrift to batch

Github.com/jaegertracing/jaeger/thrift-gen/agent/agent.go #187

func (p *agentProcessorEmitBatch) Process(seqId int32, iprot, oprot thrift.TProtocol) (success bool, err thrift.TException) {    args := AgentEmitBatchArgs{}    if err = args.Read(iprot); err != nil {        iprot.ReadMessageEnd()        return false, err    }    iprot.ReadMessageEnd()    var err2 error    if err2 = p.handler.EmitBatch(args.Batch); err2 != nil {        return true, err2    }    return true, nil}

Parsing thrift is a very troublesome thing, this format of data is for the machine to see, the need to follow the specified format step-by-step resolution, not as convenient as JSON, but thrift can actually reduce the space occupied.

    • Submit data

Github.com/jaegertracing/jaeger/thrift-gen/jaeger/tchan-jaeger.go #39

func (c *tchanCollectorClient) SubmitBatches(ctx thrift.Context, batches []*Batch) ([]*BatchSubmitResponse, error) {    var resp CollectorSubmitBatchesResult    args := CollectorSubmitBatchesArgs{        Batches: batches,    }    success, err := c.client.Call(ctx, c.thriftService, "submitBatches", &args, &resp)    if err == nil && !success {        switch {        default:            err = fmt.Errorf("received no result or unknown exception for submitBatches")        }    }    return resp.GetSuccess(), err}

The agent submits the data to collector through the RPC framework TChannel, which is developed by Uber and uses tchannel,agent to submit data to collector in batches. This framework provides a useful feature: context transfer . Why is it? Talk about one of the problems we encountered: RPC Development Interface, the business party on demand to pass the parameters of the function call, such a way in the pre-business does not cause problems. However, with the development of the company, version of the iteration, an interface needs to be compatible with the client version is very common thing, so there is a problem, as RPC server and business party calls are cross-process, when the context is not consistent, the RPC server side does not know the client version, it is difficult to be compatible with this. is the increment parameter? or add another service interface? These methods are not friendly enough, it is best to deal with this problem without the need for business party changes, when the context of the transfer is reflected in its role.
No mercy data is ubiquitous in Jaeger, as can be seen from the code above, if the commit fails, the data is lost, no retries, no re-queued operations.

Collectore--3

Collector collects the data, saves the data in the database, although the duty is different, but in the program design and the agent is the same, can see from their realization belongs to the various development personnel Division of labor Development completes. Below we are also divided into 3 steps to disassemble the implementation of collector.

Initialize Collector

The collector is an RPC server implemented using TChannel, with 2 TCP-based RPC services enabled at startup, one for receiving Jaeger format data and one for receiving Zipkin format data.

Github.com/jaegertracing/jaeger/cmd/collector/main.go # 100

......ch, err := tchannel.NewChannel(serviceName, &tchannel.ChannelOptions{})if err != nil {    logger.Fatal("Unable to create new TChannel", zap.Error(err))}server := thrift.NewServer(ch)zipkinSpansHandler, jaegerBatchesHandler := handlerBuilder.BuildHandlers()server.Register(jc.NewTChanCollectorServer(jaegerBatchesHandler))server.Register(zc.NewTChanZipkinCollectorServer(zipkinSpansHandler))portStr := ":" + strconv.Itoa(builderOpts.CollectorPort)listener, err := net.Listen("tcp", portStr)if err != nil {    logger.Fatal("Unable to start listening on channel", zap.Error(err))}ch.Serve(listener)......

Receiving data from the agent

Github.com/jaegertracing/jaeger/cmd/collector/app/span_handler.go #69

func (jbh *jaegerBatchesHandler) SubmitBatches(ctx thrift.Context, batches []*jaeger.Batch) ([]*jaeger.BatchSubmitResponse, error) {    responses := make([]*jaeger.BatchSubmitResponse, 0, len(batches))    for _, batch := range batches {        mSpans := make([]*model.Span, 0, len(batch.Spans))        for _, span := range batch.Spans {            mSpan := jConv.ToDomainSpan(span, batch.Process)            mSpans = append(mSpans, mSpan)        }        oks, err := jbh.modelProcessor.ProcessSpans(mSpans, JaegerFormatType)        if err != nil {            return nil, err        }        ......    }    return responses, nil}

This is where the RPC server receives the data, and the processed data is placed in the queue.

Github.com/jaegertracing/jaeger/pkg/queue/bounded_queue.go #76

func (q *BoundedQueue) Produce(item interface{}) bool {    if atomic.LoadInt32(&q.stopped) != 0 {        q.onDroppedItem(item)        return false    }    select {    case q.items <- item:        atomic.AddInt32(&q.size, 1)        return true    default:        if q.onDroppedItem != nil {            q.onDroppedItem(item)        }        return false    }}

Here Collector the operation of the queue is abstracted into boundedqueue, which brings convenience to the reading code. Boundedqueue's implementation of the queue based on select + Chan and agent has the same function, the graceful stop queue and the view queue length are achieved on the basis of production and consumption. Collector Queue Data is stacked to 2000, and the data is discarded without mercy. Of course, these can be adjusted as well:

    • --collector.queue-size (default 2000)

    • --collector.num-workers (default 50)

Save data

Consumption queue data

Github.com/jaegertracing/jaeger/pkg/queue/bounded_queue.go #53

func (q *BoundedQueue) StartConsumers(num int, consumer func(item interface{})) {    var startWG sync.WaitGroup    for i := 0; i < num; i++ {        q.stopWG.Add(1)        startWG.Add(1)        go func() {            startWG.Done()            defer q.stopWG.Done()            for {                select {                case item := <-q.items:                    atomic.AddInt32(&q.size, -1)                    consumer(item)                case <-q.stopCh:                    return                }            }        }()    }    startWG.Wait()}

One step here is not very clear, why use "STARTWG" to confirm that the worker started to complete? What's not going to happen?

官方回复:    to ensure all consumer goroutines are running by the time we return from this function

Graceful shutdown Queue mode: Close (q.stopch)

Data is saved to the database--cassandra

Github.com/jaegertracing/jaeger/plugin/storage/cassandra/spanstore/writer.go #122

func (s *SpanWriter) WriteSpan(span *model.Span) error {    ds := dbmodel.FromDomain(span)    mainQuery := s.session.Query(        insertSpan,        ds.TraceID,        ds.SpanID,        ds.SpanHash,        ds.ParentID,        ds.OperationName,        ds.Flags,        ds.StartTime,        ds.Duration,        ds.Tags,        ds.Logs,        ds.Refs,        ds.Process,    )    if err := s.writerMetrics.traces.Exec(mainQuery, s.logger); err != nil {        return s.logError(ds, err, "Failed to insert span", s.logger)    }    if err := s.saveServiceNameAndOperationName(ds.ServiceName, ds.OperationName); err != nil {        // should this be a soft failure?        return s.logError(ds, err, "Failed to insert service name and operation name", s.logger)    }    ......    return nil}

The servicename and OperationName were saved to Cassandra with special operations, and the LRU algorithm was used for caching. This step cache should be designed to reduce query pressure on Cassandra queries.

Github.com/jaegertracing/jaeger/plugin/storage/cassandra/spanstore/service_names.go #69

func (s *ServiceNamesStorage) Write(serviceName string) error {    var err error    query := s.session.Query(s.InsertStmt)    if inCache := checkWriteCache(serviceName, s.serviceNames, s.writeCacheTTL); !inCache {        q := query.Bind(serviceName)        err2 := s.metrics.Exec(q, s.logger)        if err2 != nil {            err = err2        }    }    return err}

Collector in the order in which the caches are built, the cache is put into the database. Query method: Key/value.
Since the cache will have an expiration time (default 12h), and Jaeger saves the data for 2 days by default, is there a case of duplicate save error? Because servicename is the primary key index.

CREATE TABLE IF NOT EXISTS jaeger_v1_dc.service_names (    service_name text,    PRIMARY KEY (service_name))

This happens when MySQL is bound to get an error, but there is no such situation in Cassandra.

    • Cassandra

cqlsh:jaeger_v1_dc> select * from service_names1; service_name--------------         test(1 rows)S lsh:jaeger_v1_dc> INSERT INTO service_names1 (service_name) VALUE                ... ('test');
    • Mysql

mysql> select * from service_names1;+--------------+| service_name |+--------------+| test         |+--------------+1 row in set (0.00 sec)mysql> insert into service_names1 (service_name) values ('test');ERROR 1062 (23000): Duplicate entry 'test' for key 'PRIMARY'

Are you surprised? Although there is no error, it will guarantee uniqueness. Interested students can simply understand the basic usage, grammar and MySQL very similar. About Cassandra We are also stones, do not do too much description.

Golang Usage Specifications

NSQ Jaeger
Directory Name lowercase/underline Lowercase/Middle Horizontal
Name of function Small Hump Small Hump
Filename Underline Underline
Variable Small Hump Small Hump
Constant Small Hump Small Hump
Package Name Current directory Name Current directory Name
Request Address Underline * Lowercase
Request parameters * Lowercase Small Hump
return parameters Underline Small Hump
Command-line arguments Middle Horizontal Line Prefix + dot + middle horizontal line

The "*" is due to not finding enough reference examples.

Conclusion

Jaeger showed me a lot of things: UDP usage, graceful shutdown, temporary object pooling, LRU algorithm implementation, etc. Not only is the golang aspect, also has the procedure design, the service design, the agent, the Collector, the Query3 service responsibility are very single, this should be the Origin micro-service thought division. There are many things that need to be digested, and there are a lot of things I don't notice, only the curious ones, but there are lots of harvests. The summary is: get to knowledge!!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.