Thrift Tcompactprotocol Compact Binary Protocol Analysis

Source: Internet
Author: User

Label:

Thrift's Compact Transport Protocol Analysis:

Use a picture to illustrate how each data type is represented in the tcompactprotocol of thrift.

Message Format code:

type bool:

A single byte.

If the bool Type field is a member field of a struct or message and has a number, the high 4 bits of a byte represent the field number, and the lower 4 bits represent the value of bool (0001:true, 0010:false), which is the low 4-bit value of one byte (True:1,false:2).

If the bool Type field exists individually, one byte represents the value, that is: the value of one byte (True:1,false:2).

Byte Type:

The number of a byte is combined with the type (high 4-bit number offset 1, low 4-bit type), and a byte value.

I16 Type:

A byte number is combined with a type (high 4-bit number offset 1, low 4-bit type), and a value of one to three bytes.

I32 Type:

A byte number is combined with a type (high 4-bit number offset 1, low 4-bit type), and a value of one to five bytes.

I64 Type:

A byte number is combined with a type (high 4-bit number offset 1, low 4-bit type), and a value of one to 10 bytes.

Double class type:

  A byte number is combined with a type (high 4-bit number offset 1, low 4-bit type), and a eight-byte value.

  Note: Convert the double type of data to eight bytes and send it in a small-ended way.

String Type:

A byte number is combined with a type (high 4-bit number offset 1, low 4-bit type), one to five bytes of payload data length, and load data.

Struct class type:

A byte number is combined with a type (high 4-bit number offset 1, low 4-bit type), struct payload data, and a byte end tag.

MAP Type:

A byte number and type combination (high 4-bit number offset 1, low 4-bit type), number of map elements from one to five bytes, a combination of key value types of one byte (high 4-bit key type, low 4-bit value type), map payload data.

Set class type:

Representation one: A byte number and type combination (high 4-bit number offset 1, low 4-bit type), the number of elements of a byte and value type combination (high 4-bit key element number, low 4-bit value type), set load data.

Applies to cases where the number of elements in a set is less than or equal to 14.

Expression two: A byte number and type combination (high 4 bit number offset 1, low 4-bit type), a byte of the key value type (high 4 bits are all 1, low 4-bit value type), one to five bytes of the number of map elements, set load data.

Applies to cases where the number of elements in a set is greater than 14.

List class type:

  representation One: a byte number and type combination (high 4-bit number offset 1, low 4-bit type), one-byte element number and value type combination (high 4-bit key element number, low 4-bit value type), List payload data.

Applies to cases where the number of elements in a set is less than or equal to 14.

  expression two: a byte number and type combination (high 4 bit number offset 1, low 4-bit type), a byte of the key value type (high 4 bits are all 1, low 4-bit value type), one to five bytes of the number of MAP elements, list load data.

Applies to cases where the number of elements in a set is greater than 14.

Message (function) type:

One byte version, one byte of message invocation (Request: 0x21, Response: 0x41, Exception: 0X61,ONEWAY:0X81), one to five bytes of message name length, message name, message parameter payload data, one byte end tag.

The above description is based on the case that the number of adjacent fields is less than or equal to 15.

If the field adjacent number is greater than 15, the type and number need to be represented separately: one byte for the type and one to five bytes for the number offset value.

Reading here, it may be doubtful why the value of numeric values is expressed as "one to five bytes"?

Reason: Compression of the numerical value, compression algorithm is varint, the following simple explanation of what is varint numerical compression.

Varint Numeric compression

An integer is typically expressed as a 32-bit, and the storage requires 4 bytes.

If the integer size is less than 256, then only one byte is needed to store the integer, so that the remaining 3 bytes of storage space are idle.

If the integer size is between 256 and 65536, then only two bytes are required to store the integer, so that the remaining 2 bytes of storage space are idle.

If the integer size is between 65536 and 16777216, then only three bytes are required to store the integer, so that the remaining 1 bytes of storage space are idle.

If the integer size is between 16777216 and 4294967296, then the integer must be stored in four bytes.

At this point, Google introduced the Varint, the free space representing the integer compression, with this idea to serialize integers.

This compact method of representing numbers. It uses one or more bytes to represent a number, and the smaller the number, the smaller the number of bytes.

Varint the number in 7-bit segments, compressing an integer and storing it.

The highest bit of each byte in the varint has a special meaning, if the bit is 1, the subsequent byte is also part of the number, and if the bit is 0, the end.

The other 7 bits are used to represent numbers. Therefore, a number less than 128 can be represented by a byte. A number greater than 128 will use two bytes.

This allows for numeric compression.

With Varint, for a small number of int32 types, it can be represented by 1 bytes. Of course everything has good and bad side, using varint notation, large numbers need 5 byte to represent.

From a statistical point of view, generally not all of the numbers in the message are large numbers, so in most cases, with varint, you can use a smaller number of bytes to represent the digital information.

Implement the Varint32 code:

uint32_t Tcompactprotocolt<transport_>::writevarint32 (uint32_t N) {  uint8_t buf[5];  uint32_t wsize = 0;  while (true) {    if ((n & ~0x7f) = = 0) {      buf[wsize++] = (int8_t) n;      break;    } else {      buf[wsize++] = (int8_t) ((N & 0x7F) | 0x80);      n >>= 7;    }  }  Trans_->write (buf, wsize);  return wsize;}

Implement the Varint64 code in the same way:

uint32_t Tcompactprotocolt<transport_>::writevarint64 (uint64_t N) {  uint8_t buf[10];  uint32_t wsize = 0;  while (true) {    if ((n & ~0x7fl) = = 0) {      buf[wsize++] = (int8_t) n;      break;    } else {      buf[wsize++] = (int8_t) ((N & 0x7F) | 0x80);      n >>= 7;    }  }  Trans_->write (buf, wsize);  return wsize;}

Perhaps you would doubt if an integer is the highest and the lower is 1, which means the negative number with varint how to compress?

Since positive numbers can be varint very good compression, can you convert negative numbers to positive numbers and then use Varint to do numerical compression?

The answer is: yes.

How to turn negative numbers into positive numbers:

Introduce an algorithm called Zigzag, so what is zigzag?

Zigzag algorithm

Positive numbers: Multiply the current number by 2, zigzagy = x * 2

Negative numbers: Current number multiplied by-2 minus 1, zigzagy = x *-2-1

The shift in the program means:

The code indicates:

/** * Convert L into a zigzag long. This allows negative numbers to be * represented compactly as a varint. */template <class transport_>uint64_t tcompactprotocolt<transport_>::i64tozigzag (const int64_t l) {  Return (l << 1) ^ (l >> 63);} /** * Convert n into a zigzag int. This allows negative numbers to be * represented compactly as a varint. */template <class transport_>uint32_t tcompactprotocolt<transport_>::i32tozigzag (const int32_t N) {  Return (n << 1) ^ (n >> 31);}

Thrift the value of the sending procedure is: first do zigzag get a number, and then do varint value compression.

Here is an example to illustrate the Thrift Tcompactprotocol protocol.

Build a rpc.thrift IDL file.

Namespace Go demo.rpcnamespace cpp demo.rpcstruct argstruct {    1:byte argbyte,    2:string argstring    3:i16  argI16,    4:i32  argI32,    5:i64  argI64,    6:double argdouble,}service rpcservice {    list <string> Funcall (        1:argstruct argstruct,        2:byte argbyte,        3:i16  argI16,        4:i32  argI32,        5:i64  argI64,        6:double argdouble,        7:string argstring,        8:map<string, string > Parammapstrstr,        9:map<i32, string> parammapi32str,        10:set<string> paramsetstr,        11: Set<i64> paramSetI64,        12:list<string> paramliststr,        ),}

Using commands to generate go code

Thrift--gen go-o src rpc.thrift

Write a Go Thrift client:

Package Mainimport ("Demo/rpc" "FMT" "Git.apache.org/thrift.git/lib/go/thrift" "NET" "OS" "Time") func m Ain () {startTime: = Currenttimemillis ()//transportfactory: = Thrift. Newtframedtransportfactory (Thrift. Newttransportfactory ()) Transportfactory: = Thrift. Newttransportfactory ()//protocolfactory: = Thrift. Newtbinaryprotocolfactorydefault ()//protocolfactory: = Thrift. Newtjsonprotocolfactory ()//protocolfactory: = Thrift. Newtsimplejsonprotocolfactory () Protocolfactory: = Thrift. Newtcompactprotocolfactory () transport, err: = Thrift. Newtsocket (NET. Joinhostport ("127.0.0.1", "8090")) if err! = Nil {fmt. Fprintln (OS. Stderr, "Error resolving Address:", err) OS. Exit (1)} Usetransport: = Transportfactory.gettransport (transport) Client: = RPC. Newrpcserviceclientfactory (Usetransport, protocolfactory) If err: = transport. Open (); Err! = Nil {fmt. Fprintln (OS.  Stderr, "Error opening socket to 127.0.0.1:8090", "", err)      Os. Exit (1)} defer transport. Close () Argstruct: = &rpc. argstruct{} argstruct.argbyte = argstruct.argstring = "str value" argstruct.argi16 = si argstruct.argi32 = argstruct.argi64 = argstruct.argdouble = 11.22 Parammap: = Make (map[string]string) parammap["name"] = "n Amess "parammap[" pass "] =" Vpass "PARAMMAPI32STR: = Make (map[int32]string) parammapi32str[10] =" Val10 "Paramm     API32STR[20] = "VAL20" PARAMSETSTR: = Make (Map[string]bool) paramsetstr["Ele1"] = True paramsetstr["Ele2"] = True    paramsetstr["Ele3"] = true paramSetI64: = Make (Map[int64]bool) paramseti64[11] = True paramseti64[22] = True    PARAMSETI64[33] = true paramliststr: = []string{"L1.", "L2."} R1, E1: = client.  Funcall, Argstruct, 11.22, "login", Parammap,parammapi32str, Paramsetstr, paramSetI64, PARAMLISTSTR) fmt. Println ("call->", R1, e1) EndTime: = Currenttimemillis () fmt. Println ("PrograM exit. Time-> ", EndTime, StartTime, (endtime-starttime))}func Currenttimemillis () Int64 {return time. Now (). Unixnano ()/1000000}

To write a simple test go server:

Package main import ("Demo/rpc" "FMT" "Git.apache.org/thrift.git/lib/go/thrift" "OS") const (NETWORKADDR = ": 8090") type Rpcserviceimpl struct {} func (this *rpcserviceimpl) Funcall (argstruct *rpc. Argstruct, Argbyte int8, ArgI16 int16, argI32 int32, argI64 Int64, argdouble float64, argstring string, parammaps Trstr map[string]string, Parammapi32str map[int32]string, Paramsetstr map[string]bool, paramSetI64 Map[int64]bool, p Aramliststr []string] (R []string, err Error) {FMT.    Println ("-->funcall:", argstruct) R = Append (R, "Return 1 by Funcall.")    R = Append (R, "Return 2 by Funcall.") return} func Main () {//transportfactory: = thrift. Newtframedtransportfactory (Thrift. Newttransportfactory ()) Transportfactory: = Thrift. Newttransportfactory ()//protocolfactory: = Thrift. Newtbinaryprotocolfactorydefault () Protocolfactory: = Thrift. Newtcompactprotocolfactory ()//protocolfactory: = Thrift. Newtjsonprotocolfactory ()//protocolfactorY: = Thrift. Newtsimplejsonprotocolfactory () Servertransport, err: = Thrift. Newtserversocket (NETWORKADDR) if err! = Nil {fmt. Println ("error!", err) OS. Exit (1)} Handler: = &rpcserviceimpl{} Processor: = RPC. Newrpcserviceprocessor (handler) Server: = Thrift. NewTSimpleServer4 (processor, servertransport,transportfactory, protocolfactory) fmt. PRINTLN ("Thrift Server In", NETWORKADDR) server. Serve ()}

Go build rpcclient.go executes after the executable file rpcclient is generated.

Perform the pre-capture packet analysis.

Request: 0000,   6e, 6c 6c 1c, 090010-------------6c   />71 3d 0a D7 A3 All-in-all-in-6c, 160030, and 3d 0a D7 A3, 02, 6c,   6f, 690040 6e 1b   6e 6d (6e) 6d 730050---------------   30 6c, 1a, 650070   6c, 03, 6c, 6 6c, 1a, 2c, 6c, 2e, and more.   C 2e 00 Response: 0000-----------------6e, 6c, 6c, and   720010.   6c 6c 2e 610020-----------------   

Starts analyzing the request data for the packet capture.

Message Header Analysis:

The first byte, 82, represents the compact protocol version.

compact_protocol_id       = 0x082

The second byte 21 indicates: How to calculate 21 for a message request?

Compact_version           = 1compact_version_mask      = 0x1fcompact_type_mask         = 0x0e0compact_type_bits         = 0x07compact_type_shift_amount = 5 (compact_version & Compact_version_mask) | (Byte (typeId) << compact_type_shift_amount) & Compact_type_mask)

  Message request typeID is 1, brought into the calculation  

(0x01 & 0x1f) | ((0x01 << 5) & 0xe0   = 0x01 | 0x20 & 0xe0 = 0x01 | 0x20 = 0x21

The third byte 01 is the serial number 01 after Varint.

The fourth byte 07 is the length of the message after Varint 07.

BYTE 6e 6c 6c for message name string Funcall

To start parsing parameters:

The first parameter of a function Funcall:

1:argstruct argstruct,

BYTE 1c represents a struct, a height of 4 is 1 for a number offset of 1, and a low of 4 for C indicates that the type 0x0c is a struct.

Offset from 1 save, used for the next field number offset calculation.

    Argstruct.argbyte =    argstruct.argstring = "str value"    argstruct.argi16 =    argstruct.argi32 =    argstruct.argi64 = 43
Argstruct.argdouble = 11.22

The first member of a struct;

Byte 13 35 represents the first member of the struct Argbyte,

A height of 4 is 1 for the number offset of 1, the lower 4 is 3 for the type 0x03 to the byte type, and the value 35 is the decimal assignment of 53.

A second member of the structure;

Bytes in each of the three 6c 75 65 represents the second member of the struct, argstring,

A height of 4 is 1 for a number offset of 1, and a low of 4 bits 8 for the type 0x08 is a binary string type,

09 represents the length of the string after Varint 9, the value of the 6c 75 65 for the string "str value"

Third member of the structural body;

Byte 6c represents the first member of the struct ArgI16,

The height 4 is 1 for the number offset 1, the lower 4 is 4 to indicate that the type 0x04 is a 16-bit numeric type, the value is 6c, the binary 110 1100, the right to move one bit, do zigzag decompression, get 11 0110, is the decimal value of 54.

Fourth member of the structural body;

Byte 15 18 represents the first member of the struct ArgI32,

A height of 4 is 1 for the number offset 1, the low 4 for 5 indicates that the type 0x05 is a 32-bit numeric type, a value of 18, a binary 1 1000, a right to move a bit, do zigzag decompression, get 1100, is the decimal value of 12.

Fifth member of the structural body;

Byte 16 56 represents the first member of the struct ArgI64,

The height 4 is 1 for the number offset 1, the lower 4 is 6 to indicate that the type 0x06 is a 64-bit numeric type, a value of 56, a binary 101 0110, a right to move a bit, do zigzag decompression, get 10 1011, is the decimal value of 43.

Sixth member of the structural body;

Byte 3d 0a D7 A3 70 26 40 represents the first member of the struct argdouble,

A height of 4 is 1 for the number offset of 1, the lower 4 is 7 for the type 0x07 is a double numeric type, the value is 3d 0a D7 A3 70 26 40, is 11.22.

End tag of struct

byte 00 Indicates the end of the struct body.

The second parameter of the function Funcall:

2:byte Argbyte,   

Byte 13 35 indicates Argbyte,

A height of 4 is 1 for the number offset of 1, the lower 4 is 3 for the type 0x03 to the byte type, and the value 35 is the decimal assignment of 53.

The third parameter of the function Funcall:
3:i16 argI16,

Byte 6c represents ArgI16,

The height 4 is 1 for the number offset 1, the lower 4 is 4 to indicate that the type 0x04 is a 16-bit numeric type, the value is 6c, the binary 110 1100, the right to move one bit, do zigzag decompression, get 11 0110, is the decimal value of 54.

The fourth parameter of the function Funcall:
4:i32 argI32,

Byte 15 18 indicates ArgI32,

A height of 4 is 1 for the number offset 1, the low 4 for 5 indicates that the type 0x05 is a 32-bit numeric type, a value of 18, a binary 1 1000, a right to move a bit, do zigzag decompression, get 1100, is the decimal value of 12.

The fifth parameter of the function Funcall:
5:i64 argI64,

Byte 16 44 indicates ArgI64,

The height 4 is 1 for the number offset 1, the lower 4 is 6 to indicate that the type 0x06 is a 64-bit numeric type, a value of 44, a binary 100 0100, a right to move a bit, do zigzag decompression, get 10 0010, is the decimal value of 34.

The sixth parameter of the function Funcall:
6:double argdouble,

Byte 3d 0a D7 A3 70 26 40 means argdouble,

A height of 4 is 1 for the number offset of 1, the lower 4 is 7 for the type 0x07 is a double numeric type, the value is 3d 0a D7 A3 70 26 40, is 11.22.

The seventh parameter of the function Funcall:
7:string argstring,

Bytes 6c 6f argstring 6e represents the

A height of 4 is 1 for a number offset of 1, and a low of 4 bits 8 for the type 0x08 is a binary string type,

05 indicates the length of the string after Varint is 5, the value is 6c 6f, and the string "login"

The eighth parameter of the function Funcall:
8:map<string, string> parammapstrstr,

BYTE 1b Geneva 6e, 6d, 6e, 6d 65 73 73 04 70 61 73 73 05 76 70 61 73 73 indicates Parammapstrstr,

A high 4-bit 1 represents a number offset of 1, and a low 4-bit B indicates that the type 0x0b is a map type.

02 indicates the number of map elements after Varint 2,

88 indicates that the type of the key and value for the map element is a binary string (high 4 bits 8 means the type of the key is 0x08 as a binary string type, and a low of 4 bits 8 indicates that the type of the value 0x08 is a binary string type)

Map's first key: 6e 6d 65 is a string of length 4 6e-min 6d 65 value "name"

The value of the first key of the map: 6e 6d 65 73 73 is a string of length 6 6e, 6d 65 73 73 Value "Namess"

The second key of map: 04 70 61 73 73 is a string with a length of 4 70 61 73 73 Value "pass"

The value of the second key of the map: 05 76 70 61 73 73 is the string with a length of 5 76 70 61 73 73 Value "Vpass"

The nineth parameter of the function Funcall:
9:map<i32, string> parammapi32str,

BYTE 1b in Geneva, 6c to 32 30 for PARAMMAPI32STR,

A high 4-bit 1 represents a number offset of 1, and a low 4-bit B indicates that the type 0x0b is a map type.

02 indicates the number of map elements after Varint 2,

58 indicates that the type of the key and value of the map element is a binary string (high 4 bits 5 means the type of the key 0x05 is a 32-bit numeric type, low 4 bits 8 indicates the value of type 0x08 is a binary string type)

Map of the first key: 14, Binary 1 0100, the right to move a bit, do zigzag decompression, get 1010, is the decimal value of 10.

The value of the first key of the map: 6c 31 30 is a string of length 5 6c 31 30 Value "VAL10"

Map of the second key: 28, Binary 101 000, the right to move a bit, do zigzag decompression, get 1 0100, is the decimal value of 20.

The value of the second key of map: 5 6c 32 30 is a string of length 5 76 70 61 73 73 Value "VAL20"

The tenth parameter of the function Funcall:
10:set<string> Paramsetstr,

BYTE 1a (6c) (6c) 65 33 for Paramsetstr,

A high 4-bit 1 represents a number offset of 1, and a low 4-bit a indicates that the type 0x0a is set type,

38 indicates the number and type of elements (high 4 bits 3 means set has 3 elements, low 4 bits 8 indicates that the type of the value is 0x08 as a binary string type)

First value of Set: 6c 65 31, Length 4 string 6c 65 31 for "Ele1"

Second value of Set: 6c 65 32, Length 4 string 6c 65 32 for "Ele2"

The third value of set: 6c 65 33, Length 4 string x 6c 65 33 for "Ele3"

The 11th parameter of the function Funcall:
11:set<i64> paramSetI64,

BYTE 1a, 2c 42 for paramSetI64,

A high 4-bit 1 represents a number offset of 1, and a low 4-bit a indicates that the type 0x0a is set type,

36 indicates the number and type of elements (high 4 bits 3 means set has 3 elements, low 4 bit 6 indicates value type 0X06 is 64 is numeric type)

Set of the first value: 16, Binary 10110, the right to move a bit, do zigzag decompression, get 1011, is the decimal value of 11.

Set of the second value: 2c, binary 101100, the right to move a bit, do zigzag decompression, get 10110, is the decimal value of 22.

Set of the third value: 42, binary 1000010, the right to move a bit, do zigzag decompression, get 100001, is the decimal value of 33.

The 12th parameter of the function Funcall:
12:list<string> Paramliststr,

Bytes 6c to 2e 6c 2e paramliststr,

A high 4-bit 1 represents a number offset of 1, and a low 4-bit 9 indicates that the type 0x09 is a list type.

28 indicates the number and type of elements (high 4 bits 3 means set has 2 elements, low 4 bits 8 indicates that the type of the value is 0x08 as a binary string type)

The first value of list: 6c 2e, the string length of 3 6c to 2e to "L1."

Second value of list: 6c 2e, 3-length string 6c + 2e for "L2."

The last byte 00 indicates the end of the message.

------------------------------------------------------------------------------------------------------------

Start analyzing the response data of the capture packet.

Response: 0000---------   6e, 6c 6c, 720010--------   />6C 6c 2e All-in-a-200030-6e------   00

  

The first byte, 82, represents the compact protocol version.

compact_protocol_id       = 0x082

The second byte 41 indicates: How to calculate 41 for a message request?

Compact_version           = 1compact_version_mask      = 0x1fcompact_type_mask         = 0x0e0compact_type_bits         = 0x07compact_type_shift_amount = 5 (compact_version & Compact_version_mask) | (Byte (typeId) << compact_type_shift_amount) & Compact_type_mask)

Message request typeID is 1, brought into the calculation

(0x01 & 0x1f) | ((0x02 << 5) & 0xe0    = 0x01 | 0x40 & 0xe0 = 0x01 | 0x40 = 0x41

The third byte 01 is the serial number 01 after Varint.

The fourth byte 07 is the length of the message after Varint 07.

BYTE 6e 6c 6c for message name string Funcall

Response Parameters:

List<string>

Bytes-------------------------6e 6c 6c 2e

09 indicates that the type 0x09 is a list type,

00 indicates that the field is numbered 0 in response (the return value does not have a number ), because the return value does not have a field number, so the type and number are separated into different bytes.

28 indicates the number and type of elements (high 4 bits 3 means set has 2 elements, low 4 bits 8 indicates that the type of the value is 0x08 as a binary string type)

The first value of list: three-in-one, 6e, 6e, 6c, 6c 2e, String length 20, 20 31 20 62 79 20 46 6e 6c 6c 2e for "return 1 by Funcall."

The second value of the list: three-in-one, 6e, 62, 6e, 6c, 6c 2e, 20-length string, 20, 32, 20, 79, 20 46 6e 6c 6c 2e for "return 2 by Funcall."

The last byte 00 indicates the end of the response message.

Done.

Thrift Tcompactprotocol Compact Binary Protocol Analysis

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

Tags Index: