This is a creation in Article, where the information may have evolved or changed.
Original address: Jaeger Source analysis--Peep Distributed system implementation
Objective
Analysis Jaeger source code mainly for the following reasons:
The company is using Jaeger, through the understanding of its source code, you can better control the system.
Understanding the design of distributed systems
Improve the understanding of Golang
Improve your English
The analyzed version is the latest version of 0.10.0, time 2017-11-23
Agent--3
Agnet is in between Jaeger-client and collector, belonging to the role of agents, mainly the data sent by the client from thrift to batch, and through the RPC batch submitted to collector.
Initialize Agent
Github.com/jaegertracing/jaeger/cmd/agent/app/flags.go #35
var defaultProcessors = []struct { model model protocol protocol hostPort string}{ {model: "zipkin", protocol: "compact", hostPort: ":5775"}, {model: "jaeger", protocol: "compact", hostPort: ":6831"}, {model: "jaeger", protocol: "binary", hostPort: ":6832"},}
3 UDP services initialized when agent is turned on
Each service corresponds to a different data format
6831 ports are officially recommended to receive data
Receive data from Jaeger-client
Github.com/jaegertracing/jaeger/cmd/agent/app/servers/tbuffered_server.go #80
func (s *TBufferedServer) Serve() { atomic.StoreUint32(&s.serving, 1) for s.IsServing() { readBuf := s.readBufPool.Get().(*ReadBuf) n, err := s.transport.Read(readBuf.bytes) if err == nil { readBuf.n = n s.metrics.PacketSize.Update(int64(n)) select { case s.dataChan <- readBuf: s.metrics.PacketsProcessed.Inc(1) s.updateQueueSize(1) default: //这里需要注意,如果写比处理快,agent将会扔掉超出的部分数据 s.metrics.PacketsDropped.Inc(1) } } else { s.metrics.ReadError.Inc(1) } }}
Each UDP server has its own separate queue and worker, each queue (the default length of 1000) will have 50 (co-formed) worker consumption queue data, and can adjust the queue and worker size according to the system load.
Graceful Close
Go Initializes a service that is simple and can be implemented in the form of for{}. But when you start, you have to think about how to shut down, can't you just force it off? The request processing half is interrupted, causes the dirty data to appear, obviously is not the result which we want, therefore has graceful closes the way. Graceful shutdown is achieved by basically receiving a signal from the primary service and then notifying the child service that it will not perform the current operation.
Let's take a look at the implementation of the NSQ and Jaeger notification sub-service stops:
Github.com/nsqio/nsq/nsqd/topic.go #215
func (t *Topic) messagePump() { ...... for { select { case msg = <-memoryMsgChan: ...... case <-t.exitChan: goto exit } ...... } exit: t.ctx.nsqd.logf("TOPIC(%s): closing ... messagePump", t.name)}
func (s *TBufferedServer) Serve() { atomic.StoreUint32(&s.serving, 1) for s.IsServing() { ...... }}
On the implementation that notifies the child service to stop executing, both NSQ and Jaeger are leaving a portal where the primary service notifies the child service. The difference is in stopping this step:
NSQ uses Chan+goto,exitchan to receive a signal, executes a goto, and jumps out for the for loop.
Jaeger uses atomic manipulation to set S.serving to 0 by atomic operation, jumping out for a for loop.
Temporary Object Pool
Online blog on the temporary object pool introduced quite detailed "go concurrent programming Combat"--temporary object pool. The purpose of a temporary object pool is to store reusable values and reduce garbage collection.
To play the role of the object pool, first make sure that the pool is not empty. If you get a value from an empty pool, only the new value is re-used, not the effect of multiplexing. So the general usage is first get, then put.
readBuf := s.readBufPool.Get().(*ReadBuf)
As you can see from the above code, the agent wants to reuse "*READBUF" through the object pool, but does not see the put step, because this step is handled by the worker.
Github.com/uber/jaeger/cmd/agent/app/servers/tbuffered_server.go #124
func (s *TBufferedServer) DataRecd(buf *ReadBuf) { s.updateQueueSize(-1) s.readBufPool.Put(buf)}
Why not put "readbuf" into the pool when the data is put into the queue? This is determined by the current scenario. First "readbuf" is a pointer, the second pointer will be placed in Chan, in this case, if Chan has a data accumulation (worker processing queue data), when the agent receives the client data, because of the reuse of "*readbuf", has caused all of Chan's data and new data to be the same as the problem of confusion, examples. So to reuse a value, put it back to the pool only after the worker consumes the queue data.
After viewing the object pool usage of the agent, take a look at the object pool used by NSQ.
Github.com/nsqio/nsq/nsqd/topic.go #197
func (t *Topic) put(m *Message) error { select { case t.memoryMsgChan <- m: default: b := bufferPoolGet() //b => bp.Get().(*bytes.Buffer) err := writeMessageToBackend(b, m, t.backend) bufferPoolPut(b) ...... } return nil}
Here the use is relatively easy to understand points, first get "bytes." Buffer ", then processing the M data, and finally the"bytes. Buffer "put into the pool. Unlike agents, Writemessagetobackend does not accumulate data and data is garbled. There is also a small detail when putting "*bytes." Buffer "Put into the pool and get out, B will also retain the last processed data, so NSQ will clear the data, using a clean value."
No Mercy data
The agent's service queue has a length limit (default 1000), and if it accumulates more than 1000, the agent will discard the data without mercy. Of course here is not inappropriate, Jaeger positioning is a set of log system, not very important data reliability. If you want to reduce the problem of data loss, you can configure or increase the agent node. Because the Jaeger and NSQ are not the same for the data, they are not compared to this part of the function. NSQ more attention to the reliability of data.
Submit data
Github.com/jaegertracing/jaeger/cmd/agent/app/processors/thrift_processor.go #104
func (s *ThriftProcessor) processBuffer() { for readBuf := range s.server.DataChan() { protocol := s.protocolPool.Get().(thrift.TProtocol) protocol.Transport().Write(readBuf.GetBytes()) //这步就是把“*ReadBuf”Put到池子 s.server.DataRecd(readBuf) // acknowledge receipt and release the buffer //将数据从thrift解析成Batch并提交 if ok, _ := s.handler.Process(protocol, protocol); !ok { // TODO log the error s.metrics.HandlerProcessError.Inc(1) } s.protocolPool.Put(protocol) } s.processing.Done()}
Here is the implementation of a worker that initializes 150 workers to process the queue data when the agent is started. The consumption queue uses the For + range method, not the use of Select + Chan, the introduction of these 2 methods can be seen in Go channel--range and select. Here the agent is lazy, without regard to graceful shutdown, if the queue accumulates data, and the agent is restarted queue data will be lost.
Github.com/jaegertracing/jaeger/thrift-gen/agent/agent.go #187
func (p *agentProcessorEmitBatch) Process(seqId int32, iprot, oprot thrift.TProtocol) (success bool, err thrift.TException) { args := AgentEmitBatchArgs{} if err = args.Read(iprot); err != nil { iprot.ReadMessageEnd() return false, err } iprot.ReadMessageEnd() var err2 error if err2 = p.handler.EmitBatch(args.Batch); err2 != nil { return true, err2 } return true, nil}
Parsing thrift is a very troublesome thing, this format of data is for the machine to see, the need to follow the specified format step-by-step resolution, not as convenient as JSON, but thrift can actually reduce the space occupied.
Github.com/jaegertracing/jaeger/thrift-gen/jaeger/tchan-jaeger.go #39
func (c *tchanCollectorClient) SubmitBatches(ctx thrift.Context, batches []*Batch) ([]*BatchSubmitResponse, error) { var resp CollectorSubmitBatchesResult args := CollectorSubmitBatchesArgs{ Batches: batches, } success, err := c.client.Call(ctx, c.thriftService, "submitBatches", &args, &resp) if err == nil && !success { switch { default: err = fmt.Errorf("received no result or unknown exception for submitBatches") } } return resp.GetSuccess(), err}
The agent submits the data to collector through the RPC framework TChannel, which is developed by Uber and uses tchannel,agent to submit data to collector in batches. This framework provides a useful feature: context transfer . Why is it? Talk about one of the problems we encountered: RPC Development Interface, the business party on demand to pass the parameters of the function call, such a way in the pre-business does not cause problems. However, with the development of the company, version of the iteration, an interface needs to be compatible with the client version is very common thing, so there is a problem, as RPC server and business party calls are cross-process, when the context is not consistent, the RPC server side does not know the client version, it is difficult to be compatible with this. is the increment parameter? or add another service interface? These methods are not friendly enough, it is best to deal with this problem without the need for business party changes, when the context of the transfer is reflected in its role.
No mercy data is ubiquitous in Jaeger, as can be seen from the code above, if the commit fails, the data is lost, no retries, no re-queued operations.
Collectore--3
Collector collects the data, saves the data in the database, although the duty is different, but in the program design and the agent is the same, can see from their realization belongs to the various development personnel Division of labor Development completes. Below we are also divided into 3 steps to disassemble the implementation of collector.
Initialize Collector
The collector is an RPC server implemented using TChannel, with 2 TCP-based RPC services enabled at startup, one for receiving Jaeger format data and one for receiving Zipkin format data.
Github.com/jaegertracing/jaeger/cmd/collector/main.go # 100
......ch, err := tchannel.NewChannel(serviceName, &tchannel.ChannelOptions{})if err != nil { logger.Fatal("Unable to create new TChannel", zap.Error(err))}server := thrift.NewServer(ch)zipkinSpansHandler, jaegerBatchesHandler := handlerBuilder.BuildHandlers()server.Register(jc.NewTChanCollectorServer(jaegerBatchesHandler))server.Register(zc.NewTChanZipkinCollectorServer(zipkinSpansHandler))portStr := ":" + strconv.Itoa(builderOpts.CollectorPort)listener, err := net.Listen("tcp", portStr)if err != nil { logger.Fatal("Unable to start listening on channel", zap.Error(err))}ch.Serve(listener)......
Receiving data from the agent
Github.com/jaegertracing/jaeger/cmd/collector/app/span_handler.go #69
func (jbh *jaegerBatchesHandler) SubmitBatches(ctx thrift.Context, batches []*jaeger.Batch) ([]*jaeger.BatchSubmitResponse, error) { responses := make([]*jaeger.BatchSubmitResponse, 0, len(batches)) for _, batch := range batches { mSpans := make([]*model.Span, 0, len(batch.Spans)) for _, span := range batch.Spans { mSpan := jConv.ToDomainSpan(span, batch.Process) mSpans = append(mSpans, mSpan) } oks, err := jbh.modelProcessor.ProcessSpans(mSpans, JaegerFormatType) if err != nil { return nil, err } ...... } return responses, nil}
This is where the RPC server receives the data, and the processed data is placed in the queue.
Github.com/jaegertracing/jaeger/pkg/queue/bounded_queue.go #76
func (q *BoundedQueue) Produce(item interface{}) bool { if atomic.LoadInt32(&q.stopped) != 0 { q.onDroppedItem(item) return false } select { case q.items <- item: atomic.AddInt32(&q.size, 1) return true default: if q.onDroppedItem != nil { q.onDroppedItem(item) } return false }}
Here Collector the operation of the queue is abstracted into boundedqueue, which brings convenience to the reading code. Boundedqueue's implementation of the queue based on select + Chan and agent has the same function, the graceful stop queue and the view queue length are achieved on the basis of production and consumption. Collector Queue Data is stacked to 2000, and the data is discarded without mercy. Of course, these can be adjusted as well:
Save data
Consumption queue data
Github.com/jaegertracing/jaeger/pkg/queue/bounded_queue.go #53
func (q *BoundedQueue) StartConsumers(num int, consumer func(item interface{})) { var startWG sync.WaitGroup for i := 0; i < num; i++ { q.stopWG.Add(1) startWG.Add(1) go func() { startWG.Done() defer q.stopWG.Done() for { select { case item := <-q.items: atomic.AddInt32(&q.size, -1) consumer(item) case <-q.stopCh: return } } }() } startWG.Wait()}
One step here is not very clear, why use "STARTWG" to confirm that the worker started to complete? What's not going to happen?
官方回复: to ensure all consumer goroutines are running by the time we return from this function
Graceful shutdown Queue mode: Close (q.stopch)
Data is saved to the database--cassandra
Github.com/jaegertracing/jaeger/plugin/storage/cassandra/spanstore/writer.go #122
func (s *SpanWriter) WriteSpan(span *model.Span) error { ds := dbmodel.FromDomain(span) mainQuery := s.session.Query( insertSpan, ds.TraceID, ds.SpanID, ds.SpanHash, ds.ParentID, ds.OperationName, ds.Flags, ds.StartTime, ds.Duration, ds.Tags, ds.Logs, ds.Refs, ds.Process, ) if err := s.writerMetrics.traces.Exec(mainQuery, s.logger); err != nil { return s.logError(ds, err, "Failed to insert span", s.logger) } if err := s.saveServiceNameAndOperationName(ds.ServiceName, ds.OperationName); err != nil { // should this be a soft failure? return s.logError(ds, err, "Failed to insert service name and operation name", s.logger) } ...... return nil}
The servicename and OperationName were saved to Cassandra with special operations, and the LRU algorithm was used for caching. This step cache should be designed to reduce query pressure on Cassandra queries.
Github.com/jaegertracing/jaeger/plugin/storage/cassandra/spanstore/service_names.go #69
func (s *ServiceNamesStorage) Write(serviceName string) error { var err error query := s.session.Query(s.InsertStmt) if inCache := checkWriteCache(serviceName, s.serviceNames, s.writeCacheTTL); !inCache { q := query.Bind(serviceName) err2 := s.metrics.Exec(q, s.logger) if err2 != nil { err = err2 } } return err}
Collector in the order in which the caches are built, the cache is put into the database. Query method: Key/value.
Since the cache will have an expiration time (default 12h), and Jaeger saves the data for 2 days by default, is there a case of duplicate save error? Because servicename is the primary key index.
CREATE TABLE IF NOT EXISTS jaeger_v1_dc.service_names ( service_name text, PRIMARY KEY (service_name))
This happens when MySQL is bound to get an error, but there is no such situation in Cassandra.
cqlsh:jaeger_v1_dc> select * from service_names1; service_name-------------- test(1 rows)S lsh:jaeger_v1_dc> INSERT INTO service_names1 (service_name) VALUE ... ('test');
mysql> select * from service_names1;+--------------+| service_name |+--------------+| test |+--------------+1 row in set (0.00 sec)mysql> insert into service_names1 (service_name) values ('test');ERROR 1062 (23000): Duplicate entry 'test' for key 'PRIMARY'
Are you surprised? Although there is no error, it will guarantee uniqueness. Interested students can simply understand the basic usage, grammar and MySQL very similar. About Cassandra We are also stones, do not do too much description.
Golang Usage Specifications
|
NSQ |
Jaeger |
Directory Name |
lowercase/underline |
Lowercase/Middle Horizontal |
Name of function |
Small Hump |
Small Hump |
Filename |
Underline |
Underline |
Variable |
Small Hump |
Small Hump |
Constant |
Small Hump |
Small Hump |
Package Name |
Current directory Name |
Current directory Name |
Request Address |
Underline |
* Lowercase |
Request parameters |
* Lowercase |
Small Hump |
return parameters |
Underline |
Small Hump |
Command-line arguments |
Middle Horizontal Line |
Prefix + dot + middle horizontal line |
The "*" is due to not finding enough reference examples.
Conclusion
Jaeger showed me a lot of things: UDP usage, graceful shutdown, temporary object pooling, LRU algorithm implementation, etc. Not only is the golang aspect, also has the procedure design, the service design, the agent, the Collector, the Query3 service responsibility are very single, this should be the Origin micro-service thought division. There are many things that need to be digested, and there are a lot of things I don't notice, only the curious ones, but there are lots of harvests. The summary is: get to knowledge!!