This is a creation in Article, where the information may have evolved or changed.
Today this tip mainly introduces some knowledge of protocol parsing, go language as a service-side programming language, inevitably involves the communication protocol parsing, even if not to do network communications, it will inevitably involve file parsing, in fact, their knowledge points are the same. In real-world scenarios, communication protocols can usually be divided into two categories: binary protocol and text protocol. The GOB format built into the go language is a binary protocol, while JSON, XML, and so on are text protocols.
Suppose we want to send a value of 123, the binary protocol only requires one byte, because a byte (byte) has 8 bits (bit), 2 8 is 256, a byte can express any value between 0-255, a total of 256 possibilities.
If we send 123 this value using a text protocol, we need at least three bytes, because 123 of this number needs to be converted into 2 ASCII characters of character ' 1 ', ' 3 ', ' three ', and stored in three bytes.
So the same data, the volume expressed by the binary protocol is usually less than the volume expressed by the text protocol. This characteristic manifests in the network application, may be the network bandwidth demand difference.
In other words, when we write the value 123 to a file using a binary protocol, we open it with a text editor and see the character ' {' because the ASCII value of this character is exactly 123. When we use a text protocol to store data, we can read the value 123 directly using a text editor.
Therefore, the binary protocol is usually less conducive to reading, and the text protocol is easy to read. This feature is reflected in the development of the time, may be the difference between debugging ease.
Another difference between binary data and text data is the difference in execution efficiency. In the case of a value of 123, for example, the binary serialization only needs to be directly assigned to a byte, while in the text format, it is necessary to calculate the value of ' a ', ' ten ', ' hundred ', and then to the ASCII code, and then to three bytes, the same is true when deserializing.
The above analysis of the binary protocol and some features of the text protocol, and does not say which is the optimal solution, because different scenarios will require different technical solutions. For example, TCP/IP protocol is a binary protocol, the HTTP protocol built on TCP/IP is a text protocol, they have each application scenario, so there will be technical differences.
Again, the protocol parsing, when we want to parse the binary data, often need to use the go language built-in encoding/binary
package, the package built-in big-endian and small-endian binary data operation.
What is big endian and small end order? Take the value 256 as an example, we mentioned above, a byte can express any value between 0-255, but when we want to express the value of 256, how to express it?
255 in binary expression is 1111 1111
, plus 1 is 1 0000 0000
, more than a 1 out, obviously we need to use an extra byte to hold this 1, but this 1 to be stored in the first byte or the second byte? At this time because people choose the difference, there are big-endian and small-order differences.
When we put this 1 in the first byte, we call it the big endian format. When we put 1 in the second byte, we call it a small-endian format.
These two formats obviously have no way to say who is better, so two formats have been their own supporters, if the standard implementation of a communication protocol, it must be strictly in accordance with the standard byte order to achieve. If it is a custom binary protocol, choose which format you prefer.
encoding/binary
The global variables in the package are BigEndian
used to manipulate the big-endian data, which is LittleEndian
used to manipulate the small-endian data, both of which have interfaces for the data types ByteOrder
:
type ByteOrder interface { Uint16([]byte) uint16 Uint32([]byte) uint32 Uint64([]byte) uint64 PutUint16([]byte, uint16) PutUint32([]byte, uint32) PutUint64([]byte, uint64) String() string}
The first three methods are used to read the data, and the last three methods are used to write the data.
You may notice that the above method operates on unsigned integers, what if we want to manipulate the signed integer type? It's simple, casting is fine, like this:
func PutInt32(b []byte, v int32) { binary.BigEndian.PutUint32(b, uint32(v))}
You may also notice that the method provided above is an operation integer value. The reason is that the implementation of floating-point numbers may not be the same in different programming languages, and there is no way to give a pervasive standard in the runtime library. What if we want to write and read the floating-point numbers?
In practice, there are two approaches, one is to agree on a protocol good one rounding accuracy, such as how many digits after the decimal point, and then turn the floating point number to the corresponding precision of the integer, this approach is best cross-language compatibility.
If it is a binary data exchange between applications developed by the go language, or a programming language that complies with the IEEE 754 floating-point number standard, you can use math
these functions in the package:
func Float32bits(f float32) uint32func Float32frombits(b uint32) float32func Float64bits(f float64) uint64func Float64frombits(b uint64) float64
All of these are numeric-type operations, but how do the various complex data such as text, lists, dictionaries, and so on in the actual scenario be implemented in the binary protocol?
The most important problem of complex binary data representation is the data segmentation problem, for example, we want to serialize the following go structure into binary:
type MyStruct struct { Field1 int32 Field2 string Field3 []int16}
The first problem we will encounter is how to differentiate the fields?
First we analyze the structure, where the first field is int32
of type, and the data type is fixed to be expressed as 4 bytes, so we call it a fixed-length type. The second field is string
of type, and the contents of the string are not necessarily, so we call it the variable-length data type. The third field is []int16
of type, and the number of elements in the list is not necessarily, but the byte length of each element is fixed.
For a string, we can use two bytes to hold its length before the start of the string, so that our string can hold 65,536 characters (2 of the 16 sides). For arrays, you can use two bytes to hold the number of its elements.
This allows us to get the following serialization and deserialization code:
Package Mainimport ("FMT" "Encoding/binary") func main () {var S1 = mystruct {123, "456", []int16{1,2,3}} var s2 mystruct S2. Unmarshal (S1. Marshal ()) fmt. Println (S1, S2)}type mystruct struct {Field1 int32 Field2 string Field3 []int16}func (s *mystruct) b Inarysize () int {return 4 +//Field1 2 + len (s.field2) +//len + Field2 2 + 2 * LEN (S.FIELD3)//Len + field3}func (s *mystruct) Marshal () []byte {b: = make ([]byte, S.binarysi Ze ()) N: = 0 binary. Bigendian.putuint32 (b[n:], UInt32 (s.field1)) n + = 4 binary. Bigendian.putuint16 (b[n:], UInt16 (Len (s.field2))) n + = 2 copy (b[n:], s.field2) n + = Len (s.field2) Binary. Bigendian.putuint16 (b[n:], UInt16 (Len (s.field3))) n + = 2 for I: = 0; I < Len (s.field3); i + + {binary. Bigendian.putuint16 (b[n:], UInt16 (S.field3[i])) n + = 2} return B}func (S *mystruct) unmarshal (b []byte) {n: = 0 S.field1 = int32 (binary. Bigendian.uint32 (b[n:])) n + = 4 x: = Int (binary. Bigendian.uint16 (b[n:])) n + = 2 S.field2 = string (b[n:n + x]) n + = x s.field3 = make ([]int16, Bi Nary. Bigendian.uint16 (b[n:])) n + = 2 for I: = 0; I < Len (s.field3); i + + {S.field3[i] = int16 (binary. Bigendian.uint16 (b[n:])) n + = 2}}
The protocol design techniques used above also apply to the sending of long message packets. In many scenarios, the length of the message packet is not fixed, just like the string field above. We can use a fixed number of bytes at the beginning to hold the message length, when parsing the communication protocol can be intercepted from the byte stream of message packets, such operations are often called protocol subcontracting or Sticky packet processing.
Paste a pseudo-code that reads the message packet from the socket (not compiled):
func ReadPacket(conn net.Conn) ([]byte, error) { var head [2]byte if _, err := io.ReadFull(conn, head[:]); err != nil { return err } size := binary.BigEndian.Uint16(head) packet := make([]byte, size) if _, err := io.ReadFull(conn, packet); err != nil { return err } return packet}
The above code is used in one of the previous tips io.ReadFull
to ensure that the full data is read at once.
Note that this code is not thread-safe, and if there are two threads working on one at the same time net.Conn
ReadPacket
, a serious error is likely to occur, and the logic should be analyzed by itself.
From the above structure serialization and deserialization of the code, it is not difficult to see that the implementation of a binary protocol is quite cumbersome and easy to make a bug, as long as a little bit of a numerical error in the analysis of errors.
Therefore, in engineering practice, we do not recommend handwritten binary protocol parsing code, the project will often use automated tools to help generate code.
Because of space limitations, this article does not have the means to further introduce the text protocol related knowledge, because later prepared bufio
to speak and, both of json
which will involve the text protocol related knowledge, so put in a later article to introduce.
Go language tips for a comprehensive collection of suggestions and feedback, welcome message.