Analysis of Three ucs2 encoding formats (80, 81, 82) in SIM card

Source: Internet
Author: User

You can see a ucs2-encoded article on the Internet. Save the article at the original address:

Http://hi.baidu.com/youren4548/blog/item/fa08bd1bf61005058618bf1d.html

There are two main operations on SIM card data: one is text message operation and the other is address book operation. The two types of codes are slightly different:

1. Short Message operation:

In short messages, the maximum length of a text message is 140 bytes by default.

The pure ASCII character is mainly in the 7-bit encoding format, that is, only the last 7 digits of the character are used, so that the 160 ASCII characters only occupy 140 bytes. In this way, we can send 160 ASCII characters to a text message on our mobile phone.

Characters that contain Chinese characters are in the ucs2 encoding format, that is, the Unicode 2-byte encoding format. Each character occupies two bytes. As long as the text message contains Chinese characters, the character of the entire text message must be ucs2 encoded, so that the entire text message can only send up to 70 characters.

2. Address Book operations:

The name length limit in the address book is different. The length of my TD module is 14 bytes.

Each ASCII character normally uses an 8-bit encoding format, that is, each byte occupies 8 bits, which is also the most normal storage format.

If the name contains Chinese characters, it is normally in the 80 encoding format, that is, the name starts with 80, followed by ucs2 data, but in some cases it starts with 81 or 82.

A) Start with 80:

The string starting with 80 is in the ucs2 format (Note: Only Chinese characters must be followed; otherwise, it may be a pure ASCII string starting with 80.

Example 1:China

Unicode encoding:4e2d56fd

In the 80 scheme of ucs2, it is:804e2d56fd


Example 2:Du 10niang

Unicode encoding:675c003100305a18

In the 80 scheme of ucs2, it is:80675c003100305a18

Obviously, as long as there are Chinese characters, numbers also need to take up two bytes.


B) start with 81:


The format starting with 81 contains a base address (One byte). With this base address, you can use one byte to represent a ucs2.

In the format, 81 is the identifier, and the last byte represents the length of the entire string, followedOne byteIt is the base address, and data will be used in the future. For example:

Example 3:Du Dudu

Unicode encoding:675c675c675c

In the 80 scheme of ucs2, it is:80675c675c675c

In the 81 scheme of ucs2, it is:8103 cedcdcdc

Analyze the 81 scheme of ucs2:8103 cedcdcdc

81:To mark

03:Indicates that the entire string is 3 Characters

CE:One byte is the base address. The resolution method is as follows: shift the base address (CE) to seven places left, and set the highest position to 0. Then fill in another 0 (so that 16 digits are involved ). In this case

The address is changed to 0x6700, and then the subsequent data bytes are determined.

Dcdcdc:Three data bytes: DC, DC, and DC. If the maximum bit of a Data byte is 0, it is considered to be an ASCII character. If the maximum bit of the Data byte is

1, the lower 7 digits are an offset of the base address. The actual ucs2 character adds this offset value to the base address. Because the maximum bits of the three data bytes are 1

The offset value is 5C, 5C, and 5C. The actual ucs2 code is 0x675c 0x675c 0x675c. We can see it here.

Example 4:Ding yi qi Yu(Note: These characters belong to the GBK character set)

Unicode encoding:4e004e014e024e034e044e05

In the 80 scheme of ucs2, it is:804e004e014e024e034e044e05

In the 81 scheme of ucs2, it is:81069c808182838133

Analyze the 81 scheme of ucs2:81069c808182838133

81:To mark

06:Indicates that the entire string is 6 Characters

9C:One byte is the base address. The resolution method is as follows: shift the base address (9C) to seven places left, and set the highest position to 0. Then fill in another 0 (16 digits in this way ). The base address

Change to 0x4e00, and then judge the subsequent data bytes.

808182838485: 680, 81. Because the maximum bits of the six data bytes are 1, the actual offset value of the six characters is: 00,

01,02, 03,04, 05. The actual ucs2 code is 0x4e00, 0x4e01, 0x4e02, 0x4e03, 0x4e04, 0x4e05.
OK.

 

 C) Start with 82:


The format starting with 82 contains a base address (Two bytes). With this base address, you can use one byte to represent a ucs2.

In the format, 81 is the identifier, and the last byte represents the length of the entire string, followedTwo bytesIt is the base address, and data will be used in the future. For example:

Example 5:8025ef Fang

Unicode encoding:00380030003200350045004682b3

In the 80 scheme of ucs2, it is:8000380030003200350045004682b3

In the 81 scheme of ucs2, it is:(Because the format is limited, it can contain up to 128 Chinese characters and 127 English letters, so 81 format cannot be used here)

In the 82 scheme of ucs2, it is:82078280383032354546b3

Analyze the ucs2 82 solution:82078280383032354546b3

 82:To mark

07:Indicates that the entire string is 7 characters long.

 8280:Two bytes are the base address.

383032354546b3: 7Data bytes:, and B3. If the maximum bit of a Data byte is 0, it is considered to be an ASCII character. If

The highest bit of the Data byte is 1, and the lowest 7 bit is an offset of the base address. The actual ucs2 character adds this offset value to the base address. Because the first six bytes of the seven data bytes are the highest

Is 0, so it indicates 6 ASCII characters 0x38, 0x30, 0x32, 0x35, 0x45, that is8,0,2,5,E,F. If the maximum bit of the seventh byte is 1

The offset value is 0x33, and the base address 0x8280 must be added. ucs2 is encoded as 0x82b3 (Fang)

Example 6:Du du 1

Unicode encoding:675c675c0031

In the 80 scheme of ucs2, it is:80675c675c0031

In the 81 scheme of ucs2, it is:8103cedmove 31

In the 82 scheme of ucs2, it is:82036700dcsc31

Analyze the ucs2 82 solution:82036700dcsc31

82:To mark

03:Indicates that the entire string is 6 Characters

 6700:Two bytes are the base address.

Dsp_31: 36Data byte DC, DC, 31. Because the highest bit of the first two three bytes is 1, the offset value of the data is 0x5c, and the base address must be added.

0x6700, ucs2 encoded as: 0x675c (Du). The maximum bit of the third byte is 0, so it indicates an ASCII character: 0x31, that is
1.


Here, we only slightly analyzed the decoding of the three ucs2 formats (80, 81, 82). We must have understood the meaning of each field, and the encoding is much easier.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.