Send Chinese characters using Unicode encoding in SMS

Source: Internet
Author: User
Author: Chen Xiaofei
Last Updated:
Key words: SMS, PDU, Unicode, gb2312, Linux, encoding conversion

SMS is a specification developed by ESTI (GSM 03.40 and GSM 03.38 ). There are two ways to send and receive SMS messages: text mode or pdu (protocol description unit) mode. In text mode, only common ASCII characters can be sent. To send images, ringtones, and other encoded characters (such as Chinese), The PDU mode must be used.
In PDU mode, three encoding methods can be used to encode the content to be sent: 7-bit, 8-bit, and 16-bit. 7-bit encoding is used to send Common ASCII characters. 8-bit encoding is usually used to send data messages, such as slices and ringtones. 16-bit encoding is used to send Unicode characters. In these three encoding modes, the maximum number of characters that can be sent is 160, 140, and 70.
To send a message in Chinese (or Japanese), you must use Unicode encoding in PDU mode.
I recently participated in a program for sending and receiving text messages in Linux. Specifically, you must send and receive Chinese characters. I have no experience in Chinese encoding and Unicode encoding, so I checked some materials and asked some questions on some forums. Now, I want to help my friends who will do similar projects in the future. I wrote relatively simple, on the PDU specifications, you can see here: http://www.ascend-tech.com.cn/sustain/SMS_PDU-mode.pdf, or go to the Wavecom site to find.

1. Conversion from gb2312 encoding to unicode encoding

On the RedHat 7.3 system, the gb2312 encoding is used by default to save Chinese characters (for both Chinese and English texts ). Therefore, you must first convert the gb2312 encoded string to the unicode encoded string. Gb2312 is a multi-byte encoding method. For Chinese characters, it is represented by two bytes. For English, it is represented by one byte, which is the English ASCII code. (Note: I have not carefully read the gb2312 code specification. The above understanding is obtained in actual development and cannot be guaranteed ). Unicode encoding is double-byte encoding. It uses two bytes for all characters. On the Linux platform, there are three implementation methods (or more) for converting gb2312 encoding to unicode encoding ):
1) Use the mbstowcs () function. It is the conversion from multi-byte encoding to wide characters. I tried it to convert it correctly, but this function may not be very reliable.

2) use the gb2312à Unicode conversion table to manually convert the table. If there is such a conversion table on the internet, you need to convert each gb2312 character based on whether it is a Chinese character or an English character.

3) use the iconv () function. This may be a standard method on Linux. It can not only convert gb2312 to Unicode, but also convert between any two encodings (provided that the Linux system supports these encodings ).
First, use iconv_open () to open a conversion handle and specify the encoding before and after conversion.
Then use icnov () for conversion. Finally, use iconv_close () to close the handle and release the resource.

# Include <iconv. h>

# Define buflen 200
Char inbuf [buflen];
Char outbuf [buflen];
Char * pin = inbuf;
Char * pout = outbuf;

... Open the file and read the gb2312 data to inbuf. The data length is Len.

Int inleft = Len;
Int outleft = buflen;

Iconv_t CD;
If (Cd = iconv_open ("gb2312", "Unicode") = (iconv_t)-1)
Return-1;
If (iconv (Cd, & pin, & inleft, & pout, & outleft) = (size_t)-1)
Return-1;
Iconv_close (CD );

When using iconv (), pay attention to the use of parameters. inleft indicates the length of data in the input buffer, and outleft indicates the size of the output buffer. (Ensure that the output buffer is large enough ).
After conversion, outleft is the size of idle space in outbuf, so buflen-outleft is the real Unicode Data Length.
Note: whether it is gb2312 or Unicode encoding, there are some byte sequences in the memory. Therefore, we can use an array of char (or unsigned char) characters to save them. Therefore, buflen-outleft is the number of characters (char), rather than the number of Unicode characters.

2. Unicode to 16-bit encoding conversion

After unicode encoding is obtained, it must be converted to the 16-bit encoding of the PDU before it can be correctly sent. In this conversion process, pay attention to two points:
1) The 0xfeff mark at the beginning of Unicode encoding should be removed. The content after 0xfeff is the real UNICODE character. (For the reason why the 0xfeff mark exists, please let me know ).

2) Unicode is a dual-byte character, because our system is a little-Endian (little-Endian), that is to say, during storage, it is the first low, then high, for example, the Unicode encoding in "medium" is 0x4e2d and the storage is 2d4e. When converting to 16-bit encoding, pay attention to the different order. Of course, if your system is large-Endian, you do not need to do so.

OK. I will not write more about how to convert the Unicode code 0x4e2d to the 16-bit code of "4e2d.

3. correctly calculate the 16-bit encoded message body length

4. Set first-octet, TP-Mr, TP-pid, TP-DCS, TP-VP correctly

In PDU format, the first-octet, TP-Mr, TP-pid, TP-DCS, TP-VP settings are correct or not, it is critical to whether Unicode can be sent. According to the protocol specifications and my debugging results, the correct settings of the above flags are as follows ):
First-octet: 11
TP-MR: 00
TP-PID: 00
TP-DCS: 08 (encoding method, 16-bit)
TP-VP: A7

After the preceding steps, you can send Chinese characters.
I hope this document will help anyone preparing for text message development in Linux.

References:
★An Introduction to the SMS in PDU mode GSM recommendation Phase 2

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.