In the article "Distributed Tracking System (i): Zipkin background and design", this paper introduces the design and data model of Zipkin, and introduces the Zipkin span model and other design of "alternative" span model in detail.
Here is a more mouth, in fact, the professional point of call should be distributed tracking system--distributed Tracingsystem, tracking more suitable for human scenarios, such as someone was tracked, and tracking more suitable for the computer field. And then the eggs. This article will continue to use trace.
The
Zipkin span model almost completely mimics the design of the span model in dapper, and we know that span is used to describe an RPC call. So an RPC call should only associate a Spanid (not a parent spanid), and span in Zipkin consists of three data parts: the underlying data: the association and Interface display for tracking nodes in the tree, including Traceid, Spanid, ParentID, name, Timestamp and duration, where the parentid-null span becomes the root node of the trace tree, which is also the starting point of the call chain, to save the Spanid creation overhead and make the top span more visible, Spanid will be the same as traceid in the top-level span. Timestamp is used to record the start time of the call, and duration represents the total elapsed time of the call, so timestamp+duration is represented as the end of the call, and duration the length of the time bar in the trace tree that is represented in the span. Note that the name here is used to display on the time bar of the tracking tree node. Annotation data: Used to record critical events, only four, CS (client Send), SR (server receive), SS (server Send), CR (Client receive), so in the span model, Annotation is a list with a maximum length of 4. Each of the key events contains value, timestamp, and Endpoint,value, which is one of the CS, Sr, SS, and CR, and timestamp represents the time that occurs, Endpoint is used to record the machine (IP) and service name (ServiceName) that occurred. It is natural to think that CS and Cr, SR, and SS machine names are the same, for the sake of simplicity, CS and CR service names can be the same, SR and SS service names can be the same. Annotation data is mainly used to display specific span information when a user clicks on a span node. Binaryannotation data: We are not satisfied with only the call chain time information in the tracking tree, if you need to bind some business data (log), you can write the data into the binaryannotation, its structure and annotation data exactly the same, It is also a list in span, but it is no longer stated here, but it is not appropriate to put too much data in the binaryannotation, otherwise it will result in a decline in performance and experience.
Now that we have a look at the internal structure of a span, this is the final form of span, which means that this is the final form that Zipkin sees when it collects data and presents it to the user's locks. The generation of spans is "incomplete", and the Zipkin server needs to assemble the same span of the same traceid and Spanid as the last span, as mentioned above. Perhaps this is not very intuitive, we use the following figure to illustrate:
Zipkin data collection (Figure 1)
The figure above has been used in my first Zipkin posting, which is no longer elaborated here, and we look directly at the internal span detail map of the diagram:
span Data flow (Figure 2)
Note that the above illustration does not show all the details of span (such as name and Binaryannotation, etc.), but this does not affect our analysis of the problem. The ① and ⑥ in the figure above are a complete RPC call, which occurs between server 0 and server 1, and it is obvious that the spanid used to describe the span of the RPC call is 1000, so this is the same span, Only its data comes from two different servers (applications): server 0 and server 1. To the lower level, the span is represented by two trace logs, one on server 0 and the other on server 1, and their span traceid, Spanid, and Parentspanid are the same. And the span becomes the top node in the trace tree because their parentspanid is null. For step ①, the SR on server 1 minus the CS on server 0 is approximately equal to the time of the network (ignoring the difference between the different server clocks), and, for the rest of the steps, Sr-cs and CR-SS get the network's time-consuming. We then look at the request steps ② and ④, from the level of the trace tree they belong to the ① under the sub-call, so their parentspanid is 1000 of the ①. Steps ② and ④ each produce a spanid (above 1001 and 1002), so as shown above, seemingly a simple RPC process actually produces 6 span logs, which are assembled into 3 spans on the Zipkin server.
So, the problem is, this call has 3 spanid:1000, 1001, and 1002 on server 1, and if I want to record the business data on server 1 and this call (recorded by Binaryannotation), which span is the data bound to. If let us choose, we certainly choose 1000, because server 1 on this request in the downstream service is uncertain (although the figure is only painted server 2 and server 3), it is possible that it will call downstream of the more than 10 services, resulting in more than 10 Spanid, It seems more reasonable to bind business data to the parent span (1000) of these spans. And when the business log is generated, it is possible that the downstream call has not started, so it can only be bound to 1000.
Let's take a look at what the span in Figure 2 might look like in the Zipkin tree, as shown in the following figure:
Zipkin Tracking Tree (Figure 3)
Of course, some of the data will be different from those in Figure 2 (such as timestamp and duration), but it doesn't affect the way we analyze problems. As you can see, the smallest unit of time in the Zipkin is microsecond (1 per thousand milliseconds), so the total length of the RPC shown in Figure 3 is 96.2ms, and some people are beginning to wonder why the RPC calls to four servers have only three nodes in the trace tree in the diagram. Because in the tracking tree, a span (exactly a spanid) will only show up as a tree node, such as a tree node Service1 represents the process of Gateway (server 0) calling Service1 (server 1). The tree node Service2 represents the process by which Service1 (server 1) invokes Service2 (server 2). One would certainly ask, for the tree node Service1, we recorded CS, SR, SS, and Cr four time, but the time bar display only used CS and CR (time-consuming Duration=cr-cs), So where are the SR and SS (don't forget we can compute the network time by Sr-cs and CR-SS). We can click on the Serice1 node and open the details for span (span annotation and binaryannotation data), as shown in the following figure:
Span details (Figure 4)
Relative time is relative to how long this event (CS, SR, SS, CR) has occurred (relative to the starting point), because Service1 is the top-level node, so the first line of Relativetime is empty, so The requested network time-consuming (Gateway request Service1) is 10ms, and the response network is time-consuming (Service1 answer gateway) for 96.3-94.3=2ms, so from Zipkin current page design, Network time is only through the Point tree node details page, but also need to do a simple calculation, not intuitive. Taobao Eagle Eye System by the time bar is divided into two colors to display, the use of CS, SR, SS and CR four time stamps, more intuitive.
Most people may find it awkward to have RPC calls that span four of systems to show only three nodes. For the call in Figure 1, we would prefer to have a Service1 node under the Gateway node, which means that the gateway calls the Service1, while the Service1 node has Service2 and Service3 two nodes. This means that Service1 calls Service2 and Service3, which makes it easier to understand. So we thought that after a node (server application) in the RPC link, the node produced several Spanid, so that in the diagram RPC passed through gateway, Service1, Service2 and Service3, So a total of 4 Spanid (Zipkin in Figure 2 only produces 3 Spanid), which becomes the Spanid and the number of nodes (provided that the RPC link is only once per node, i.e. there is no interdependence between the nodes). In this way, the flow of span data is designed as follows:
Modified span Data flow (Figure 5)
As can be seen in Figure 5, it is clear that the 6-SPAN log will produce a Spanid (1000, 1001, 1002, and 1003) on each server node, rather than 2 Spanid like the original Figure 3. This also has the advantage that RPC calls only need to pass Traceid and Spanid, rather than as the Zipkin design, need to pass Traceid, Spanid and Parentspanid. But immediately we found out the problem, in Figure 5 of the server 1 node, 1001 of the Spanid recorded two sets of CS and CR, it also led to the inability to distinguish which corresponds to the call server 2, which corresponds to the call server 3, so this design scheme is directly rejected.
So we change a way of thinking, not using Spanid and Parentspanid, replaced by Spanid and Childspanid,childspanid by the Father node generated and passed to the child nodes, the following figure:
New span Data flow (Figure 6)
From Figure 6 You can see the obvious changes, no longer have parentspanid, and the use of Childspanid, so RPC is passed between the Traceid and Childspanid, which also directly solve the problem in Figure 5. While the design of Figure 5 and figure 6 violates the design idea that an RPC call is maintained by a Spanid data, but it does make it easier to accept and understand the interface of the tracking tree (tree nodes correspond to server nodes) and reduce the data transfer between RCP.