Previous work involves the UCS2 encoding of SIM card. We need to encode the characters in 81 format. I have found a lot of information on the Internet and shared it with you based on my own research.
This document describes how to obtain the 80, 81, and 82 encoding formats. The code is tested on the andoid platform and the contact information is successfully stored and decoded. Some things are self-developed, and errors are inevitable. If there are any false things, you are welcome to criticize and correct them.
The method for decoding the file is frameworks/base/telephony/java/com/android/internal/telephony/uicc/IccUtil. java's adnStringFieldToString () method.
What is UCS2 encoding?
UCS2 (Unicode Character Set) is a Character encoding method. 2 represents a Character encoded with 2 bytes.
We use UCS2 encoding when storing contacts on SIM cards.
There are three main formats of UCS2 encoding on SIM card: 80, 81, 82
The following describes the encoding methods of each format:
'80' format:
1. The first byte is 0x80
2. The subsequent bytes are UCS characters, and each character occupies 2 bytes. (Coding with UTF-16BE on android platform ).
Example:
Encoded Characters: 4e1c 65b9 4e0d 8d25
In the 80 format: 80 4e1c 65b9 4e0d 8d25
'81 'format:
1. The first byte is 0x81
2. The second byte is the string length.
3. The third byte is the base address, which is a 15th-bit to 8th-bit (0xxx xxxx x000 000) of the ucscode)
4. The fourth byte is later than the 81 format.
Example:
Encoded Characters: 0035 0416 0438 043a
In 81 format: 81 04 08 35 96 b8 ba
The following example shows how to obtain the base address and character encoding:
The first byte is not '00' character encoding:
0416 = 0001 0110
0438 = 0011 1000
043a = 0011 1010
So the base address is 0000 1000, that is, 08
Then the base address is used to calculate the 81 encoding of the four characters:
0035 = 0000 0000 0011 0101, 8th bits are 0, so its 81 encoding is 0011 0101, that is, 53
0416 = 0000 0100 0001 0110, the base address shifted to 0000 0100 0011 1010 0400 = 0001. After the base address is removed, the base address is 16, that is, 0110 1001, and the maximum bit is 1, which is 0110, 96
And so on, b8, ba
It can be seen from this that the 81 format can only be performed when the 15th-8th bits are the same. That is to say, the encoding range is only determined by the last seven digits and is continuous, that is, the 7th Power of 2, that is, 128 characters. Therefore, Chinese characters are not suitable to be encoded in the 81 format, because the UTF-8 encoding between two Chinese characters is likely to exceed 128.
'82 'format:
1. The first byte is 0x82
2. The second byte is the string length.
3. The third and fourth bytes are the base address. The base address can be the minimum value that the first byte in the UCS code is not 0.
4. The fourth byte is later than the 82 format.
Example:
Encoded Characters: 0061 4e0d 4e4e 4e86
In 82 format: 82 04 4e0d 61 80 c1 f9
The following example shows how to obtain the base address and character encoding:
Select the minimum value of 0 for the first byte of the ucscode. the first byte of the four characters here is not 0. Therefore, select the minimum value of these four characters as the base address, that is, 4e0d.
Then the base address is used to calculate the 82 encoding of the four characters:
0061 = 0000 0000 0110 0001, 8th bits are 0, so its 82 encoding is 61
4e0d = 0100 1110 0000 1101, minus the base address 4e0d is equal to 0, and the maximum bit complement 1 is 1000 0000, that is, 80
4e4e = 0100 1110 0100 1110, after the base address is equal to 41, the maximum bit complement 1 is 1100 0001, that is, c1
The remaining 4e86 and so on.
The format of 82 is the same as that of 81. It can only contain 128 characters and must be continuous.
Under what circumstances should we select the 80, 81, or 82 format?
1. If possible, try to select the 81 format, because the same size of bytes can use 81 format to represent the maximum number of characters. If 14 bytes can be stored, 11 characters (3 + N) can be stored in 81 format ). However, the 81 format can only contain up to 128 characters.
2. Select the 82 format. 14 bytes can store 10 characters (4 + N), and can only represent a maximum of 128 consecutive characters.
3. Select the 80 format. 14 bytes can store 6 characters (1 + 2N), but the 80 format can represent a wide range, from 0000 to FFFF. Therefore, in most cases, Chinese characters can only be in the 80 format.
The following is an example and test program written in java:
[] Ucs2ToAlphaField ([] src, srcOff, destOff, min = 0x7FFF max = 0 (srcLen> 2 (I = 0; I <srcLen; I + = 2 (src [srcOff + I]! = 0 temp = () (src [srcOff + I] <8) & 0xFF00) | + I + 1] & 0xFF (temp <0 max = min + 130 (min> min = (max <max = (max-min) <129 (() (min & 0x80) = () (max & 0x80 dest = [srcLen/2 + 3 dest [destOff + 1] = () (srcLen/2 dest [destOff] = () 0x81 min = () (min & 0x7F80 dest [destOff + 2] = () (min> 7) & 0xFF outOff = destOff + 3 dest = [srcLen/2 + 4 dest [destOff + 1] = () (srcLen/2 dest [destOff] = () 0x82 dest [destOff + 2] = () (min> 8) & 0xFF dest [destOff + 3] = () (min & 0xFF outOff = destOff + 4 (I = 0; I <srcLen; I + = 2 (src [srcOff + I] = 0 dest [outOff] = () (src [srcOff + I + 1] & 0x7F temp = () (src [srcOff + I] <8) & 0xFF00) | + I + 1] & 0xFF)-dest [outOff] = () (temp | 0x80 outOff ++ dest = [srcLen + 1 dest [destOff] = () 0x80 System. arraycopy (src, 0, dest, 1}View Code String src = "5 rows without authorization" [] dest = [] srcByte = src. getBytes ("UTF-16BE" dest = ucs2ToAlphaField (srcByte, 0, srcByte. length, 0 (I = 0; I <srcByte. length; I ++ System. out. print (Integer. toHexString (srcByte [I] & 0xFF) + "" (I = 0; I <dest. length; I ++ System. out. print (Integer. toHexString (dest [I] & 0xFF) + ""}}View Code