Environment:
In network communication, struct is used for communication.
Struct is defined as follows:
Client (with UTF-16 encoding ):
[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Unicode)] struct MsgHead { [MarshalAs(UnmanagedType.ByValArray, SizeConst = 16)] public char[] cSenderName; [MarshalAs(UnmanagedType.ByValArray, SizeConst = 16)] public char[] cRecverName; };
Server (Linux system, UTF-8 encoding ):
typedef struct _MsgHead { char cSenderName[16]; char cRecverName[16]; }MsgHead, *pMsgHead;
Use iconv () function family for Code Conversion from UTF-16 to UTF-8
The Code is as follows:
Pmsghead converttomsghead (char * recvbuf, int nrecv) {ccodeconverter CV = ccodeconverter ("UTF-16", "UTF-8 "); // A 2byte UTF-16 character is most commonly used in UTF-8 4 byte int ntransbufsize = 2 * nrecv; // as the conversion intermediary char * transbuf = new char [ntransbufsize]; memset (transbuf, 0, ntransbufsize); int nret = CV. convert (recvbuf, nrecv, transbuf, ntransbufsize); If (nret <0) {cv. geterrinfo (); return NULL;} pmsghead msghead; memcpy (msghead-> csendername, transbuf, 16); memcpy (msghead-> crecvername, transbuf + 16, 16 ); // release the transcoding intermediary cache Delete (transbuf); Return msghead ;}
Problem:
The entire structure to be received is converted, but it may cause disorder during reception.
The debugging code is as follows:
UTF-16 |
0 |
2 |
4 |
6 |
8 |
10 |
12 |
14 |
16 |
18 |
20 |
22 |
24 |
26 |
28 |
30 |
32 |
34 |
|
|
|
Xu |
Wei |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
S |
E |
|
|
|
UTF-8 |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
Xu |
Wei |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
S |
Note:
The Chinese Character occupies 2 bytes in the UTF-16, but the UTF-8 occupies 3 bytes.
Problem Analysis:
After converting the first four bytes into six bytes, the second 28 bytes are converted to 14 bytes in UTF-8.
In this way, the conversion cannot be reasonably performed during receiving.
Solution:
Encode and convert fields by field
The Code is as follows:
Pmsghead converttomsghead (char * recvbuf, int nrecv) {ccodeconverter CV = ccodeconverter ("UTF-16", "UTF-8 "); // A 2byte UTF-16 character is most commonly used in UTF-8 4 byte int ntransbufsize = 2 * nrecv; // as the conversion intermediary char * transbuf = new char [ntransbufsize]; memset (transbuf, 0, ntransbufsize); pmsghead msghead; // conversion field csendername cv. convert (recvbuf, 32, transbuf, ntransbufsize); memcpy (msghead-> csendername, transbuf, 16); // convert the field crecvername cv. convert (recvbuf + 32, 32, transbuf, ntransbufsize); memcpy (msghead-> crecvername, transbuf + 16, 16); // release the transcoding mediation cache Delete (transbuf ); return msghead ;}
Summary:
This problem is most likely to occur when applications with different codes communicate with each other and the message format is limited.
Similar problems may occur during sending.
Therefore, you need to create a compliant sending struct to regulate the messages to be sent from UTF-8 to the UTF-16 Client
Conversion during sending:
The matched struct to be used:
typedef struct _CommMsgHead { char cSenderName[32]; char cRecverName[32]; }CommMsgHead, *pCommMsgHead;
The conversion code is similar to the conversion at the receiving time, and requires field-by-field encoding conversion.