Discussion on the message packet format on the Yunfeng blog

Last Update:2018-12-03 Source: Internet

Author: User

Tags web hosting

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

FW: http://blog.codingnow.com/2006/01/aeeieaiaeioeacueoe.html#comments

Comment on the seemingly reasonable network packet protocol from the Internet

There is a small project recently, and it seems that it can be completed quickly at the very beginning. Just a very small game, 2d, Image Engine is mature, and I am passionate about Lua over the past few days, use Lua to encapsulate the original C ++ Image Engine. It feels good to use, and the UI is encapsulated well. It seems convenient to use the Lua driver for game logic. At one time, I fantasized about how many days it was open-source. Maybe it could become a quasi-standard for Lua to develop 2D games. It is unlikely to think about it. It is just an obscenity.

Small projects are more capable of training the team, so I started to teach new colleagues as a basic package. Looking at other people to make programs, you can't do it urgently. I hate to do anything on my own.

One of my colleagues works in the encapsulation of network protocols, and strive to use Lua encapsulation for better use.

To be honest, I have very limited experience in network programming, with a total of no more than 10 thousand lines of code. There are no more than five programs that can run. Some of them are based on the UDP protocol, and this small project intends to use the TCP protocol. I can't help but think of the communication protocol. First, the type of a byte and the length of the next two bytes of data packets.

At the beginning, I was quite vocal about this agreement. Because TCP is a stream protocol, if the packet length is used for subcontracting and then the type is used for dispatch logic, the most reasonable thing is to put the length information first, let the bottom-layer code split the data stream into data packets, and then the entire data packet can be handed over to a network-independent layer for processing.

During the design of the protocol, we hope that we can use the Lua script to describe each protocol and use the C ++ code to parse the network packet into a table of Lua. In this way, lua can easily process the data transmitted over the network.

In non-blocking mode, the socket API processes the TCP protocol and can be considered as an atomic operation only when reading a single byte. In this way, when the length of a data packet is greater than one byte, it cannot be read once by default. Therefore, a state machine is generally used to process network data streams. This state machine can be implemented very easily. The format of Length + Type + content can be seen as three States of the state machine. In the main logic of the program, we can process network data without stopping the processing process of the state machine.

When I designed the program based on this idea, I found that the length is purely redundant information. Because, if we define the format of each protocol using a script, the length can be derived based on the protocol type. Most protocol formats are composed of fixed-length elements, and ultimately they are fixed-length. Some elements are not long, such as string and array. The entire length can still be obtained according to the context. As for the latter type of processing, it is nothing more than increasing the difficulty of the state machine processing program.

In this way, I no longer put the packet length on the data packet header.

Cloud wind submitted on January 16,200 6 | fixed link

Trackback

If you want to reference this article, copy the following link to send a reference notice (GBK)
Http://blog.codingnow.com/mt/mt-tb.cgi/48

Comments

The concept is different. Yunfeng just wants to try it. in addition, Lua is just a super pure virtual object for C ++ code. It does not have much complexity to encapsulate internal conversions into an asynchronous stream. however, this is not a habit. Uncomfortable. And may have requirements on Lua's writing habits.

Posted by: tgame | March 26,200 7 pm

Detailed information display

Posted by: anonymous | November 7, 2006 pm

I agree with Yunfeng's point of view. I remember when I was reading the Lua source code of the client, I also found that most of the protocols do not require the length of this value, the protocol length can only be used to determine the old and new versions. When using the Script driver, you can ignore the length. As long as the script receives the type and data, it is enough, and the rest is Lua. However, it may be difficult to process strings with multiple lengths. I feel that the discussion is a bit deprecated, not the same thing.

Posted by: wangdali | October 15,200 6 AM

I want to use a multi-core chip (16-core) to implement ssl vpn in Linux. I do not know your opinions. Thank you!

Posted by: Colin | June 8, 2006

I don't know what cloud says about putting lenght after type. In terms of implementation, what's worse is simply increasing the workload of parsing code? Or is it inconvenient to use scripts? How much is the difference in code execution efficiency? I am not very clear about what method you use to obtain the data buffer. If you use the raw socket method to obtain the data buffer, The lenght defined by yourself is redundant.

Posted by: xue23 | February 24,200 6 AM

How to make full use of the multi-core CPU is also quite troublesome. I have considered the concurrency structure of the server, but the results are not satisfactory.

To enhance the server performance, I think there are two main ideas: distributed and parallel computing. distributed Computing allows multiple CPUs to complete multiple tasks. Parallel Computing allows multiple CPUs to complete the same task.

Starting from the distributed idea, we can split various applications on the server into different threads, such as communication, broadcast, monsters, chat and dating. This method is easy to implement, however, there is little room for adjustment. I once imagined whether I could make all the chat and dating services into a service and put them on another machine. The whole server is built based on the service.

Starting from the idea of concurrency, I will not talk about it if I have considered some methods that are not perfect.

I hope to have a special topic discussion (I have never touched on the design of large servers, and I am not familiar with many concurrent things due to weak theoretical foundation. I have learned about OpenMP and MPI Programming, but it does not seem to work)

Posted by: cj2528 | February 23,200 6 AM

Splitting data streams into different types of data can also be done at the underlying layer, and they are not related to logic. If network messages are parsed by scripts, this is even more important. Because the script is generally weak in processing meaningless byte streams.

Posted by: cloud | February 21,200 6 pm

My understanding is that the network layer is only responsible for communication, and another command interpreter is responsible for command dispatching. This command interpreter can support the filter mode to facilitate game application expansion.

Posted by: cj2528 | February 21,200 6 AM

Version compatibility is a problem. Not every application must be compatible with earlier versions of the Protocol. In addition, it is not necessary to use the leading length code for message splitting. The length information also has a version issue, for example, whether to use one byte, two byte, or more.

There are other methods to be compatible with the new version. For example, you can use the following method:

Define a type as an extension, and then follow the extension version number with a length information. The old client skips the next packet.

There are certainly different solutions based on different applications. I do not mean that length information is meaningless at any time, but it should not be a standard and has a value in all occasions.

Posted by: cloud | February 15,200 6 pm

Although TCP is stream-oriented, most of the actual application modes are message-oriented. Therefore, the message splitting capability must be provided. If you do not introduce a leading length code for message splitting, the message version compatibility problem cannot be solved.

Posted by: viewlg | February 14,200 pm

What is convenience? When a program works correctly and can be encapsulated without having to care about its implementation, it is equally convenient for users. If necessary, there should be a unified protocol. Or the TCP protocol itself is defined as follows.

PS. The questions discussed here are irrelevant to the returned values of Recv. This value is meaningless to the logic layer. I have seen a project in a communication field that uses the return value of Recv as the package length and defines it as a service interface for third parties to use. If, for some reason, the data that should have been Recv is divided into multiple receiving times, the server will deem it an error message discarded. The most chilling thing is that when we asked them to modify the bug, the answer was: there was a problem with the TCP protocol definition (._.!)

Posted by: cloud | February 14,200 6 pm

The length is the required Recv to obtain the length of the physical package. Although it is known, the logical length is different from the physical package length concept, the logical length not only allows you to determine the actual length of your package (that is, the length accuracy check), but also saves these two bytes much convenience than removing this length, saving this byte is unnecessary.

Posted by: nandou | February 14,200 6 AM

"Waiting for data integrity, CRC check, decompression, decryption, and so on" is just a state machine that can be implemented at the underlying layer and does not need to be placed in the logic.

PS. CRC verification should not be completed by this layer.

Posted by: cloud | February 13,200 6 pm

The length is absolutely required. Otherwise, the network underlying layer is missing. Your game processing logic also needs to wait for data integrity, CRC check, decompression, decryption, and so on .. Don't you bother?

Posted by: 2002 thinking | February 11,200 6

So busy. The type structure itself contains length information, which is not required. The focus of discussion is on abstraction and design concepts. Let's talk about my thoughts: 1. TCP ensures the network layer. what we actually do is the application layer. 2. Regarding the design concept, it is enough to organize data streams into data elements. For our applications, this is the underlying layer. I think it is suitable.

Redundancy is designed for efficiency in many cases. To achieve extreme efficiency, it is best to retain the redundancy from the client to the server, and remove the redundancy from the server to the client. After all, there is a process of determining the length. Whether there is a length or no length, the length will be determined during the packet and then added, if the length is not long, the length will be checked when the package is connected. This design will be awkward, but I think it is reasonable if we look at efficiency alone.

Posted by: coder | January 18,200 6 pm

I think the parsing of objects on the application layer is not good because it has nothing to do with logic. As long as the supported data types are fixed, you do not need to change the network-Layer Code for protocol modification, and the complexity of the application layer is also reduced.

Posted by: cloud | January 18,200 6 pm

To cloud: Well, I think I am wrong about it as a C struct, but the network layer still violates the open-close principle, and it tastes bad.

Posted by: kxjiron | January 18,200 6 pm

To kxjiron: string is a type, just like array and union. We cannot look at this problem with the concept of static struct in C. Longer than "no" does not make it difficult for the network processing layer to process these object types. It is like adding a string creatorname in iteminfostruct to add a line in the definition of iteminfostruct.

To zlong: when processing type and length, no matter who is in the front, the encoding is not difficult for specific definitions. However, type will make the design worse. If the length of each package exists, if I put the length information at the beginning, I don't have to know how the package is distributed when receiving the network package. The dispatch process can be proposed.

Posted by: cloud | January 18,200 6 pm

I don't know how to handle the message in this discussion. For the packet processing code in ----------, the type does not need to increase the processing complexity before the length. ---------- I don't think so, unless it is a big case, the processing method itself is not suitable. You only need to define a message header structure: struct msgHead {unsigned short ustype; unsigned short ussize ;}; if the received message is smaller than sizeof (msgHead) You can be sure that the message package has not yet received the complete; otherwise, you can determine whether to receive the complete message package based on ussize and submit the processing. It doesn't matter if the type is in the front

Posted by: zlong | January 18,200 6 AM

OK. I have read it carefully and thought about it. The IP protocol already has length information. The length information added to UDP is indeed redundant. Cloud wind is right, I am not serious enough. About the redundant length of UDP, my current idea is: it may be easier to use UDP itself.

Posted by: wonna | January 18,200 6 AM

I understand what you mean. The problem lies in the protocol layer. In my design, the protocol layer is placed at the application layer, rather than at the network layer.

The network layer is not responsible for parsing the Package Structure of the specific game logic. if the distributed method is used, the network layer only receives message packets from the network and forwards the packets to the corresponding Pipeline According to Len/type. the game logic layer is driven by pipeline messages. Here, we will unpack and process it. similarly, the application layer combines Len/type and sends it to the corresponding pipeline. If it is at the network layer, the network layer will be responsible for forwarding. this is a method to describe the protocol, as long as the server MSG structure is consistent with the client, you do not need to change what you call the protocol layer. the protocol design is the MSG structure design.

To atry: the application layer does not depend on the protocol layer, or the game protocol layer is not coupled with the network layer.

Here is a specific example of iteminfostruct. It is variable length. At first, only the item name is variable length. Later, the programmer wants to add a variable length forge name, is it necessary to change the network layer?

Posted by: kxjiron | January 18,200 6 AM

In fact, serialized objects do not require length information. Because the object information itself contains length information, unless it is a string. The csocket class encapsulated by MFC for serialization has no length information. In terms of design, the advantage of adding length information is that the two steps of splitting packets and Protocol parsing can be separated. However, this is not necessary because the protocol layer can directly read the object information instead of handing over the entire packet to the protocol layer.

Whether or not to cut packets first, the application layer depends on the protocol layer, which is inevitable. It does not mean that the coupling is increased if the packets are not empty.

Posted by: atry | January 18,200 6 AM

Isn't it good to use ASN.1 encoding? In essence, it is also a tag/length/value structure. It should be less active than the plain structure of C.

Posted by: hifee | January 18,200 6 AM

When I passed by, I found everyone had a heated discussion. Silly question: can you use ASN.1 encoding to implement the things you intend to implement? Asn encoding is also a tag/length/value structure that allows application layer developers to focus on protocol interfaces without worrying about changes to the internal structure. It seems that this is consistent with your intention.

Posted by: hifee | January 18,200 6 AM

This is not a problem of coupling between the network layer and the application layer. I think that the byte stream in the network package can be parsed into a data type that can be used directly at the network layer rather than the application layer. This step is done, and the packet length information naturally becomes redundant information.

Posted by: cloud | January 18,200 6 AM

I agree with analyst and wonna. This is a design concept. The coupling between the network layer and the application layer will cause you a headache in the future. Especially in the context of multi-person cooperation, code writing is more important than conceptual abstraction. I mean the difficulty of code writing is not personal code writing, but the extra thinking is required for others to write code, using Functional design as an abstract model is the most frustrating thing for me.

Posted by: kxjiron | January 18,200 6 AM

The length of the UDP packet header is redundant. this is my opinion. If what I'm talking about is not authoritative, you can check TCP/IP Vol.1.

For UDP protocol verification, it is enough to use the verification code.

Posted by: cloud | January 17,200 pm

Next, let's talk about it.

To cloud wind:

The UDP header contains UDP length information. It is designed based on the UDP application layer protocol and does not contain length information, this is because UDP is a "datagram" protocol (note this "report" Word). That is to say, each time UDP sends a complete packet from the sender, it is either not transmitted, to transmit a complete data packet, it is not like TCP. TCP is a stream protocol. It may only transmit one byte at a time, or it may complete the entire data packet, it is even possible to combine several packets for transmission, but UDP does not. It only transmits "one" and "complete" packet. Haha. Since the complete package is transmitted, when I use recvfrom, this function itself will tell me the length of the current UDP package, therefore, the UDP-based application layer protocol design does not require length information. I mean not at the application layer, not UDP itself. The length of the UDP header is required, which is related to the transmission logic control of the UDP packet. For example, the UDP length information may be used for discarding or checking the verification code.

Posted by: wonna | January 17,200 pm

To wonna: I have read the classic TCP/IP versions. You have misunderstood what I mean. I mean, I don't need to append the length information when using the UDP protocol for communication. I understand that UDP protocol requires redundant length information for some convenience. For example, you can add an IP header to the data packet to be sent by PAGE switching. In addition, because of the alignment relationship, it is not good to process data in 4-byte alignment.

In the big case, the encapsulation processing code does not need to increase the processing complexity because the type is before the length. The IP protocol itself is in the first byte: The 4bit version and the 4bit header length. With this byte, you can determine the length of the IP packet header, and then the length of the IP packet itself is irrelevant.

To analyst, I only use a format file to illustrate the problem. It is the same thing as metadata. I have already used scripts, and no one will be stupid enough to create another format description language. I use Lua to directly use the key/value of Lua table for the simplest parsing.

Similarly, I do not think that canceling the package length information is replaced by metadata to increase the design complexity. It is nothing more than using the type to correspond to a structure agreed by the Client/Server. The metadata structure can be used to understand how to parse the data below. It is also distributed to the application layer after a structure is processed.

Posted by: cloud | January 17,200 pm

In my basic opinion, similar to analyst, the network layer is responsible for processing the network layer, and the specific parsing of data packets is done by the application layer. Normally, the network layer extracts complete data packets based on the packet length field, and drops the complete data packet to the upper layer network packet processing logic.

Of course, you can also retrieve complete packages without adding length fields, but I personally think it is far from convenient to add a length field.

Posted by: wonna | January 17,200 6 pm

In fact, we always put the length before the type, the length belongs to the network layer, and the type belongs to the Protocol parsing layer, and the network layer does not need to know the type. As to whether the length is related to the implementation complexity, the asynchronous operation that was originally processed at the network layer needs to be processed n times at the resolution layer, in addition, the length field cannot be removed for better reasons, but the complexity is increased without reason. Parsing the protocol using the format description file is still relatively primitive. A better solution is to use the metadata information of the Program for automatic parsing. In this way, the type definition in the Code is the protocol, you do not need to write other format files.

Posted by: analyst | January 17,200 6

Reference cloud wind: If you simply transform the data into a package, adding the length can simplify the design. But at this time, there is no need to put the length behind the type, and the front of each package should be placed in the length information, which is also the most disdain for the packaging design of big talk.

I agree with the above view. In the application layer protocol design, the length information is often the first piece of information. However, how can this problem be solved? If this is the case, you may feel uncomfortable. I think this is only your personal experience. It doesn't matter whether it is reasonable or not. In the header of the IP protocol, the first part of the content is not the length information, but the version number of the IP protocol, followed by the length information. Therefore, I think the dispute here is more about my personal experience. It doesn't matter if it is reasonable or unreasonable. I think both are acceptable and reasonable. Therefore, for such a discussion, I generally will not continue to compete, because I think both are acceptable.

Posted by: wonna | January 17,200 6

We will continue to discuss this issue with Yunfeng. Haha: --------------- the UDP protocol naturally does not need to put the length information on its own, and there will be an IP packet header. --------------- Please check the TCP/IP details on the first page of the first volume, which contains the first presentation terrace of IP, UDP, and TCP, which is clearly written as 32 ~ 47. A total of 16 bits are UDP lengths.

As for what you said in the script, it may be more convenient to skip the length. I have not made a script, so there may be no reason to say this is good. However, if the protocol design itself is used, the application layer does need to include the packet length. I mean the TCP protocol because it is a "stream" protocol.

Posted by: wonna | January 17,200 6

The version of the data structure has nothing to do with whether to unify the length information. For fault tolerance between versions, you can pass the structure description in the handshake or handle fault tolerance for frequently updated structures.

Posted by: cloud | January 17,200 6 pm

What should I do if the data structure version is updated?

Posted by: anonymous | January 17,200 6

If you simply transform the data into a package, adding the length can simplify the design. But at this time, there is no need to put the length behind the type, and the front of each package should be placed in the length information, which is also the most disdain for the packaging design of big talk.

Now we use Lua to process network packets. After obtaining the type, you can naturally export the package length. The so-called Parsing is simply dividing data into packages and further converting data streams to corresponding data elements one by one. This step is also suitable for the underlying layer.

For example, a format description is provided for Type login,

Pakeage login {string username; string password ;};

We only need to parse the format description file when we receive the network package.

In the dispatch function do_login (pack) of the script layer, pack. username and pack. password can be used to obtain data for processing. This function is still processed after the complete data is obtained.

Verification of packet correctness can also be done.

Posted by: cloud | January 17,200

It is also possible to have no length. However, there is one problem. The original packet receiving and parsing are two levels. The underlying network module accepts a complete packet and then submits it to the upper layer for resolution. But now you have degraded the network module into an asynchronous stream interface. For the parsing layer, you must face a very unfriendly interface such as Asynchronous Io, increasing the implementation burden on the parsing layer, if your parsing layer is manually written by the user, your design is very bad. In terms of performance, the two bytes are completely negligible. If you really care about these two bytes, the Protocol Compression should be far better than saving a length field.

Posted by: analyst | January 17,200 6

It is reasonable to add the length. In this way, the web hosting package has a self-describing feature, and the root type of the program running on the root node does not need to be removed. at the same time, increasing the number of characters can also help determine the correctness of the response packet and speed up data packet parsing.

Posted by: anonymous | January 17,200 6

I used to package sockets. I feel that TCP is a stream protocol. Logically, it is better to serialize a packet into an object instead of a package. The header information is the object information.

Posted by: atry | January 17,200 6 pm

The UDP protocol naturally does not need to put the length information on its own, which is available on the IP packet header.

When C is used to process TCP data streams, it is reasonable to put the length information in most cases, and the program will be relatively simple. However, if there is a script or template description, This is not required. For example, when Lua is used, the application layer certainly wants to deliver data packets to the application layer as a Lua table and convert the data elements in the network package to the corresponding type; instead of providing APIs to read data in the buffer in bytes. In this case, the length information is not necessary in the parsing process.

As for the slightly increased coding difficulty, it should be the feeling of the person who writes the code. For example, for an ordered array search, some people think that the 2-point search is more difficult to write than the sequential search. If the array is not large, they feel that the 2-point search is not a big advantage, and they all change to a for loop; in fact, where can a while loop and if else if be complicated?

Posted by: cloud | January 17,200 6 pm

I hate to do anything on my own.

As a project manager, this idea is very undesirable. He should try to train his team members to complete it independently. Switch.

In this way, I no longer put the packet length on the data packet header.

The header information of the IP, TCP, and UDP protocols contains the corresponding length information. The main purpose is to achieve message layering. If you don't put the length away, it means that the bottom layer of your network is bound to your upper layer logic (I personally think that the "Message Type" layer is already the application layer rather than the network layer ).

Posted by: wonna | January 16,200 pm

Is it necessary to increase the difficulty of writing codes to save 2 bytes?

Posted by: kxjiron | January 16,200 pm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More