From: http://blog.csdn.net/yao_guet/article/details/7074871
There are two main operations on SIM card data: one is text message operation and the other is address book operation. The two types of codes are slightly different:
1. Short Message operation:
In short messages, the maximum length of a text message is 140 bytes by default.
The pure ASCII character is mainly in the 7-bit encoding format, that is, only the last 7 digits of the character are used, so that the 160 ASCII characters only occupy 140 bytes. In this way, we can send 160 ASCII characters to a text message on our mobile phone.
Characters that contain Chinese characters are in the ucs2 encoding format, that is, the Unicode 2-byte encoding format. Each character occupies two bytes. As long as the text message contains Chinese characters, the character of the entire text message must be ucs2 encoded, so that the entire text message can only send up to 70 characters.
2. Address Book operations:
The name length limit in the address book is different. The length of my TD module is 14 bytes.
Each ASCII character normally uses an 8-bit encoding format, that is, each byte occupies 8 bits, which is also the most normal storage format.
If the name contains Chinese characters, it is normally in the 80 encoding format, that is, the name starts with 80, followed by ucs2 data, but in some cases it starts with 81 or 82.
A) Start with 80:
The string starting with 80 is in the ucs2 format (Note: Only Chinese characters must be followed; otherwise, it may be a pure ASCII string starting with 80.
Example 1:China
Unicode encoding:4e2d56fd
In the 80 scheme of ucs2, it is:804e2d56fd
Example 2:Du 10niang
Unicode encoding:675c003100305a18
In the 80 scheme of ucs2, it is:80675c003100305a18
Obviously, as long as there are Chinese characters, numbers also need to take up two bytes.
B) start with 81:
The format starting with 81 contains a base address (One byte). With this base address, you can use one byte to represent a ucs2.
In the format, 81 is the identifier, and the last byte represents the length of the entire string, followedOne byteIt is the base address, and data will be used in the future. For example:
Example 3:Du Dudu
Unicode encoding:675c675c675c
In the 80 scheme of ucs2, it is:80675c675c675c
In the 81 scheme of ucs2, it is:8103 cedcdcdc
Analyze the 81 scheme of ucs2:8103 cedcdcdc
81:To mark
03:Indicates that the entire string is 3 Characters
CE:One byte is the base address. The resolution method is as follows: shift the base address (CE) to seven places left, and set the highest position to 0. Then fill in another 0 (so that 16 digits are involved ). In this case
The address is changed to 0x6700, and then the subsequent data bytes are determined.
Dcdcdc:Three data bytes: DC, DC, and DC. If the maximum bit of a Data byte is 0, it is considered to be an ASCII character. If the maximum bit of the Data byte is
1, the lower 7 digits are an offset of the base address. The actual ucs2 character adds this offset value to the base address. Because the maximum bits of the three data bytes are 1
The offset value is 5C, 5C, and 5C. The actual ucs2 code is 0x675c 0x675c 0x675c. We can see it here.
Example 4:Ding yi qi Yu(Note: These characters belong to the GBK character set)
Unicode encoding:4e004e014e024e034e044e05
In the 80 scheme of ucs2, it is:804e004e014e024e034e044e05
In the 81 scheme of ucs2, it is:81069c808182838133
Analyze the 81 scheme of ucs2:81069c808182838133
81:To mark
06:Indicates that the entire string is 6 Characters
9C:One byte is the base address. The resolution method is as follows: shift the base address (9C) to seven places left, and set the highest position to 0. Then fill in another 0 (16 digits in this way ). The base address
Change to 0x4e00, and then judge the subsequent data bytes.
808182838485: 680, 81. Because the maximum bits of the six data bytes are 1, the actual offset value of the six characters is: 00,
01,02, 03,04, 05. The actual ucs2 code is 0x4e00, 0x4e01, 0x4e02, 0x4e03, 0x4e04, 0x4e05. OK.
C) Start with 82:
The format starting with 82 contains a base address (Two bytes). With this base address, you can use one byte to represent a ucs2.
In the format, 81 is the identifier, and the last byte represents the length of the entire string, followedTwo bytesIt is the base address, and data will be used in the future. For example:
Example 5:8025ef Fang
Unicode encoding:00380030003200350045004682b3
In the 80 scheme of ucs2, it is:8000380030003200350045004682b3
In the 81 scheme of ucs2, it is:(Because the format is limited, it can contain up to 128 Chinese characters and 127 English letters, so 81 format cannot be used here)
In the 82 scheme of ucs2, it is:82078280383032354546b3
Analyze the ucs2 82 solution:82078280383032354546b3
82:To mark
07:Indicates that the entire string is 7 characters long.
8280:Two bytes are the base address.
383032354546b3: 7Data bytes:, and B3. If the maximum bit of a Data byte is 0, it is considered to be an ASCII character. If
The highest bit of the Data byte is 1, and the lowest 7 bit is an offset of the base address. The actual ucs2 character adds this offset value to the base address. Because the first six bytes of the seven data bytes are the highest
Is 0, so it indicates 6 ASCII characters 0x38, 0x30, 0x32, 0x35, 0x45, that is8,0,2,5,E,F. If the maximum bit of the seventh byte is 1
The offset value is 0x33, and the base address 0x8280 must be added. ucs2 is encoded as 0x82b3 (Fang)
Example 6:Du du 1
Unicode encoding:675c675c0031
In the 80 scheme of ucs2, it is:80675c675c0031
In the 81 scheme of ucs2, it is:8103cedmove 31
In the 82 scheme of ucs2, it is:82036700dcsc31
Analyze the ucs2 82 solution:82036700dcsc31
82:To mark
03:Indicates that the entire string is 6 Characters
6700:Two bytes are the base address.
Dsp_31: 36Data byte DC, DC, 31. Because the highest bit of the first two three bytes is 1, the offset value of the data is 0x5c, and the base address must be added.
0x6700, ucs2 encoded as: 0x675c (Du). The maximum bit of the third byte is 0, so it indicates an ASCII character: 0x31, that is1.
Here, we only slightly analyzed the decoding of the three ucs2 formats (80, 81, 82). We must have understood the meaning of each field, and the encoding is much easier.