A brief analysis on the algorithm of Linux network flow control-HTB

Source: Internet
Author: User
Tags diff

The implementation of HTB (layered token bucket) algorithm was studied during the flow control of TC,HTB in the project. Think that this kind of thought is also applicable in the scenario of a producer consumer with similar and consuming priorities.
The algorithm is too complex, because the mouth is so clumsy in the title of a simple analysis, only to introduce the core idea and the implementation of the key code.

A chestnut:

TC Qdisc Add dev eth0 root handle1: HTBTC class add dev eth0 parent1: ClassID1:1HTB rate 100MIBPSTC class add dev eth0 parent1:1ClassID1:TenHTB rate 30mibps Ceil 80mibps prio0Burst 10kbit Cburst 20kbit Quantum30000TC class Add dev eth0 parent1:1ClassID1: -HTB rate 20mibps Ceil 50mibps prio1Burst 10kbit Cburst 20kbit Quantum20000TC class Add dev eth0 parent1:TenClassID1:101HTB rate 10mibps Ceil 80mibps prio1Burst 10kbit Cburst 20kbit Quantum10000TC class Add dev eth0 parent1:TenClassID1:102HTB rate 5mibps Ceil 40mibps prio0Burst 10kbit Cburst 20kbit Quantum the

Figure 1


First, create a HTB queue, create 5 classes in the queue, the relationship between them can be expressed as such a tree. Some key parameters are also marked, which will be explained later.


The following uses iptables to classify traffic, according to the destination IP, the corresponding traffic classification to 1:20 1:101 1:1023 class queue. Here are two points to note:
1. There are many ways to classify traffic, such as tools such as Cgroup iptables TC filter. However, the focus of this paper is how the traffic flows out, so the network packet into the corresponding class queue is omitted.
2. Traffic can only be cached in the leaf node of the tree (leaf Class), and other class nodes (inner Class) cannot cache traffic. However, Innerclass can play an important role in sharing the bandwidth of different subclasses.

 iptables-t mangle-a output-d 192.168 . 1.2 -j classify--set-class 1 : 20  iptables -t mangle-a output-d 192.168 . 1.3 -j classify--set-class 1 : 101  iptables -t mangle-a output-d 192.168 . 1.4 -j classify--set-class 1 : 10  


HTB Although the rate is set for each class, it does not mean that each class can only be out of the package at set rates. When the network card is idle, the Leafclass is able to be out of the package at a speed higher than. But not above ceil. one sentence is to share in the idle, busy when the proportion (this ratio is rate,quantum co-determined) to allocate bandwidth. Network packets can only be in/out Leafclass,innerclass can share bandwidth for different subclasses.
The priority attribute is also marked for each leafclass in Figure 1. HTB supports 0-7 8 priority classes and 0 precedence. The higher priority class can prioritize traffic.

Principle Introduction
At some point each class can be in one of three states:

  Can_send (The token is sufficient, the network packet sent is less than rate, the example is shown in green)

  May_borrow (There is no token, but can be borrowed.) The network packet sent is greater than rate less than ceil, shown in yellow in the example

  Cant_send (no tokens are not to be borrowed, the network packet sent is greater than ceil, the example is shown in red)

HTB is how to decide which class is out of the package?
The 1.HTB algorithm finds the class of the can_send state from the bottom of the tree. If a class with Can_send status is found, it stops.


2. If more than one class in the layer is in the Can_send state, select the class with the highest priority (the minimum).if the highest priority still has more than one class, rotation processing in these classes. Each class sends its own quantum bytes, and the next class is sent.

3. The above mentioned that only Leafclass can cache network packets, Innerclass is not a network packet. What if the steps are finally selected for Innerclass? since it is innerclass, must have their own subclass.innerclass will follow the tree down to find a descendant leafclass. And the Leafclass is in May_borrow state, Lend your spare token to the leafclass to let it out of the bag. Similarly, there may be multiple descendants leafclass in May_borrow state, where the treatment is the same as in step 2.

multiple subclasses sharing the parent class bandwidth is also reflected here. Assuming that the parent class is redundant 10MB, subclass 1 has a quantum of 30000, and subclass 2 has a quantum of 20000. Then the parent class through these Class 1 sends 30000byte, Again through these Class 2 sends 20000byte. The final effect is that subclass 1 borrowed 6MB, subclass 2 borrowed 4MB. As a result of this, the rate/quantum together determine the bandwidth they share when sharing bandwidth between subclasses. Rate is in Can_ How many bytes can be sent when the send state is quantum determines how many bytes can be borrowed from the token when it is in the May_borrow state.

Examples of Scenarios
1. Suppose that at some point, 1:101, 1:102 have network packets piled up, and all are in the Can_send state. 1:20 because there is no traffic it can be considered without this node. Figure 2 shows:
Figure 2


According to the foregoing, 1:101, 1:102 the two classes belong to the same level and have the same priority. HTB rotation them out of the bag. 1:101 each round sends 20000byte,1:102 each round sends 5000byte.
At some point 1:101 of the traffic sent exceeds its rate value (10MB), but it does not exceed its ceil value (20MB). Thus 1:101 states are transformed into May_borrow states. 3:

Figure 3


At this point, only 1:102 of this class is can_send, only to fully out of this class of packages. When 1:102 sends more traffic than its rate value (5MB), its status becomes may_borrow.4:
Figure 4



At this point there is no class of can_send state at the bottom. The two subclasses of 1:10.1:10 found on the web are in May_borrow state at this point, so 1:10 rotation send 1:101,1:102 packets. Each turn sends their corresponding quantum values. Soon 1:101 of the traffic sent reaches its ceil value (20MB), at which point the 1:101 state becomes Cant_send.5:

Figure 5


At this time, only 1:102 of the package can be sent at full power. Until its ceil value (30MB) is reached. At this point 1:102 becomes cant_send,1:101 and 1:102 accumulates the 50MB data, which reaches the rate value of its parent class 1:10, so 1:10 becomes May_ borrow.6:
Figure 6



Core code
A few core data structures are not posted here:
struct QDISC:TC queue
struct HTB_CLASS:HTB class
Before the network packet is sent from the protocol stack to the NIC, it will do a queue/exit operation. This is the entrance to the TC.
When we use the HTB algorithm, the out-of-package callback returns a SKB for Htb_dequeue,htb_dequeue to send to the NIC.

Static structSk_buff *htb_dequeue (structQdisc *Sch) {    ...     for(Level =0; Level < tc_htb_maxdepth; level++) {//search by layer. The level here is reversed, and layer 0 represents the bottom.        /*Common case Optimization-skip event handler quickly*/        intm; . .. m= ~q->Row_mask[level];  while(M! = (int)(-1)) {//the same layer takes the high priority            intPrio =Ffz (m); M|=1<<Prio; SKB= Htb_dequeue_tree (q, Prio, level);//out Package            if(Likely (SKB! =NULL)) {Sch->q.qlen--; Sch->flags &= ~tcq_f_throttled; Gotofin; }}} Sch->qstats.overlimits++; ... fin:returnSKB;}

After Htb_dequeue finds the corresponding number of layers and priority, call Htb_dequeue_tree, listing only the core code in Htb_dequeue_tree:

Static structSk_buff *htb_dequeue_tree (structHtb_sched *q,intPriointLevel ) {    structSk_buff *SKB =NULL; structHtb_class *CL, *start; . .. cl= Htb_lookup_leaf (Q->row[level] + Prio,prio, Q->ptr[level] + prio,//find a Leafclass under the priority of this levelQ->last_ptr_id[level] +prio); SKB= Cl->un.leaf.q->dequeue (CL-&GT;UN.LEAF.Q);//out Package    if(Likely (SKB! =NULL)) {CL->un.leaf.deficit[level]-= Qdisc_pkt_len (SKB);//Deficit[level] The byte number of the packet is deducted        if(Cl->un.leaf.deficit[level] <0) {//when deficit[level]<0, it is stated that the class has sent quantum. The next class needs to be sent .Cl->un.leaf.deficit[level] + = cl->Quantum; Htb_next_rb_node ( level? Cl->parent->un.inner.ptr:q->ptr[0]) +prio);    } htb_charge_class (q, CL, level, SKB); //updates the token.    }    returnSKB;}

Because it is not certain that the level passed in is the lowest level, the call to Htb_lookup_leaf guarantees that the class is leafclass. where parameters (Q->ptr[level] + prio) Record the level this priority should currently be sent to that leafclass.
SKB = Cl->un.leaf.q->dequeue (cl->un.leaf.q) out of the package.

When a packet is out of the queue, the class Deficit[level] deducts the byte number of the package, and when deficit[level]<0 indicates that the class has sent the quantum. So although again to Deficit[level] added quantum,
But Htb_next_rb_node (level cl->parent->un.inner.ptr:q->ptr[0] + Prio) has pointed out the class pointer for the priority of the layer to the next class. Next issue the package, Another class will be out.
You can take turns out of the package in comparison with the 1:101,1:102 mentioned earlier.

Htb_charge_class last update token. Look at the Htb_charge_class:

Static voidHtb_charge_class (structHtb_sched *q,structHtb_class *CL,intLevelstructSk_buff *SKB) {    intbytes =Qdisc_pkt_len (SKB); enumHtb_cmode Old_mode; Longdiff;  while(CL) {//here is a loop, the sub-class contract, is the same time to the Button class and the parent class token.diff = psched_tdiff_bounded (q->now, Cl->t_c, cl->Mbuffer); if(Cl->level >=Level ) {            if(Cl->level = = level)//The production of tokens is deducted here.cl->xstats.lends++;        Htb_accnt_tokens (cl, Bytes, diff); } Else{CL->xstats.borrows++; CL->tokens + = diff;/*we moved t_c; Update tokens*/} htb_accnt_ctokens (cl, Bytes, diff); CL->t_c = q->Now ; Old_mode= cl->Cmode; Diff=0; Htb_change_class_mode (q, cl,&diff);//decide if you want to switch the state.        if(Old_mode! = cl->Cmode) {            if(Old_mode! =htb_can_send) htb_safe_rb_erase (&cl->pq_node, Q-&GT;WAIT_PQ + cl->Level ); if(Cl->cmode! =htb_can_send) Htb_add_to_wait_tree (Q, Cl, diff); } CL= cl->parent; }}

Sub-class packet to the mouth of the parent class of the token is very good understanding, in essence, the child machine is the bandwidth of the parent class. However, when the parent token is lent out to the child machine, only the parent class starts to the ancestor buckle token. The Sub-opportunity updates the tokens produced by this cycle, but does not buckle because it is borrowed from the
After the token has been calculated, change the state of the class. That's the way it is, not in depth.

For the sake of description, some places are different from the code details.

A brief analysis on the algorithm of Linux network flow control-HTB

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.