Understanding of Chinese phone coding problems [go 〕

Last Update:2018-12-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Author: Wind corridor Source: http://www.sf.org.cn

Many posts on the Forum are discussing Chinese coding on mobile phones. I have been troubled by such problems and have received help from many enthusiastic friends. I have a little understanding and thoughts on this issue through searching and testing at one end of time. I am afraid to share it with you because of my limited level, I am also a developer of my spare time and do not have a professional theoretical level. Therefore, please forgive and criticize some mistakes in this article. This article is only intended for reference, we would like to welcome you to follow up and work together to solve this problem. :)

Basically all the strings inside the phone are using the UTF-8 encoding method.
On PC machines, we use the ASCII and Unicode encoding methods.
The ASCII encoding method is a single-byte encoding method. It can only contain 256 characters and English letters are sufficient.
However, Chinese characters cannot be expressed.
Unicode is a double-byte encoding method that can be used to represent Chinese characters, but it wastes too much space for general English letters (at least for the storage of mobile phones ).
UTF-8 is the new encoding method for embedded devices such as specialized mobile phones. Its characteristic is that the traditional ASCII character is still expressed in one byte, but if the character does not belong to the ASCII character set, it is represented by two to three digits.
Characters Between 0x0001-0x007f (traditional ASCII characters) are represented by one character.
0 | bits0-6
The characters between 0x000 and 0x0080-0x07ff are represented as follows:
1 | 1 | 0 | bits 6-10 | 1 | 0 | bits 0-5
If the VM displays such a character, the virtual opportunity removes 10 at the beginning of the first byte and 110 at the beginning of the second byte, and re-combines the remaining bits into a 2-byte number to represent the characters:
00000 | bits 6-10 | bits 0-5
Similarly, the 0x0800-0xffff character represents:
1 | 1 | 1 | 0 | bits 12-15 | 1 | 0 | bits 6-11 | 1 | 0 | bits 0-5
You can also use the same method to recombine a string of two bytes.
Note that the NULL Character in kjava also uses two bytes instead of one byte :)

Of course, the English string in the UTF-8 encoding method will not have any problem (the default is the standard acⅱ encoding mechanism) the main problem is Chinese, I personally encountered problems with Chinese character strings in kjava mobile development, mainly divided into the following categories:
1. Reading and writing the RMS database;
2. Write the Chinese name of the game in Jad;
3. Chinese Characters in network transmission (Decoding of kxml transmission );
4. Some simulators do not support Chinese either.
These parts are dangerous areas with frequent errors in Chinese during mobile phone development. The common format is garbled characters :)

1. Understand the basic principle of UTF-8 code is very helpful for us to solve the problem of Code Conversion
The method I handle in converting UTF-8 codes is like this

// Write Chinese characters to the database
String appt3 = "Chinese character ";
Bytearrayoutputstream Bos = new bytearrayoutputstream ();
Dataoutputstream dos = new dataoutputstream (BOS );
Dos. writeutf (appt3 );
Byte [] bytes3 = Bos. tobytearray ();
Rs. addrecord (bytes3, 0, bytes3.length );

// Read Chinese from the database
Byte B3 [] = Rs. getrecord (dbid );
Datainputstream Dis = new datainputstream (New bytearrayinputstream (B3 ));
String chinastring = dis. readutf ();

Writeutf () and readutf () are methods of dataoutputstream and datainputstream objects, respectively, they provide a way to convert each other from Unicode to UTF-8.
Take a closer look at the instructions of MIDP, you can see the following content
Writeutf ():
First, two bytes are written to the output stream as if by the writeshort method giving the number of bytes to follow. this value is the number of bytes actually written out, not the length of the string. following the length, each character of the string is output, in sequence, using the UTF-8 encoding for the character. if no exception is thrown, the counter written is incremented by the total number of bytes written to the output stream. this will be at least two plus the length of STR, and at most two plus thrice the length of Str.

Of course, you can also manually write the code, convert the Chinese string to byte [], and then put it into RMS. When you retrieve the code, convert it to string.
Here we use the bingo_guan method (bingo_guan, please do not mind). Of course, this code is also very designed to model Hehe, and this class can also be used for text file operations.

/**
*

Title:

Description: Unicode String Conversion Tool

Company: CC Studio

* @ Author Bingo
* @ Version 1.0
*/

Public class unicodestring
{

Public unicodestring ()
{
}

Public static string bytearraytostring (byte abyte0 [], int I)
{
Stringbuffer = new stringbuffer ("");
For (Int J = 0; j <I ;)
{
Int K = abyte0 [J ++]; // note that the code conversion is performed in this place.
If (k <0)
K + = 256;
Int L = abyte0 [J ++];
If (L <0)
L + = 256;
Char c = (char) (K + (L <8); // assemble the high and low digits
Stringbuffer. append (C );
}

Return stringbuffer. tostring ();
}

Public static string bytearraytostring (byte abyte0 [])
{
Return bytearraytostring (abyte0, abyte0.length );
}

Public static byte [] stringtobytearray (string S)
{
Int I = S. Length ();
Byte abyte0 [] = new byte [I <1];
Int J = 0;
For (int K = 0; k <I; k ++)
{
Char c = S. charat (k );
Abyte0 [J ++] = (byte) (C & 0xff); // bitwise conversion of each bit
Abyte0 [J ++] = (byte) (C> 8 );
}

Return abyte0;
}
}

2. second, in Jad and manifest text (such as the name of the game) is actually also a UTF-8 code, this is also a frequent problem of the dangerous area, I suggest writing the code into the UTF-8 manually, otherwise, if you write Chinese in unicode format, there is a risk that the program cannot be executed because it cannot be identified on the simulator or the actual device. So when you edit the JAD file should be careful to pay special attention to, wtk Jad automatic generation tool does not support direct in Jad and manifest input UTF-8 format, therefore, it is inevitable to manually modify this step :(.

3. different mobile phones actually support different default code systems, which is also the key to frequent problems. cldc's system attribute "microedition. encoding "defines the default character encoding of the device. Its value can be system. getproperty method. We can also convert it into the supported compaction mechanism to actually run our program.
This method is usually used for transmission of Chinese questions about the mobile phone, because the mobile phone is uncertain during Internet connection. I will provide an example code below to discuss this issue with you.

Server to client:
------------------------------------------------------------------
The following code uses the gbencoding () method to encode all characters into:/uxxxx.
-----------------------------------------------------------------

Code :-------------------------------------------------------------
/**
* Write the string data
*
* @ Param out
* @ Param Value
*/
Public static void writeunicode (final dataoutputstream out, final string value) throws actionexception {
Try {
Final string Unicode = stringformatter. gbencoding (value );
Final byte [] DATA = Unicode. getbytes ();
Final int datalength = data. length;

System. Out. println ("Data Length is:" + datalength );
System. Out. println ("data is:" + value );
Out. writeint (datalength); // write the length of the string first.
Out. Write (data, 0, datalength); // write the converted string
} Catch (ioexception e ){
Throw new actionexception (imdefaultaction. Class. getname (), E. getmessage ());
}
}

----------------------------------------------------------------------
The following code is the gbencoding () method, which converts two-byte characters into/uxxxx. The asiic code is preceded by 00.
----------------------------------------------------------------------
/**
* This method will encode the string to Unicode.
*
* @ Param gbstring
* @ Return
*/

Code :--------------------------------------------------------------------------------
Public static string gbencoding (final string gbstring ){
Char [] utfbytes = gbstring. tochararray ();
String unicodebytes = "";
For (INT byteindex = 0; byteindex <utfbytes. length; byteindex ++ ){
String hexb = integer. tohexstring (utfbytes [byteindex]);
If (hexb. Length () <= 2 ){
Hexb = "00" + hexb;
}
Unicodebytes = unicodebytes + // u}
System. Out. println ("unicodebytes is:" + unicodebytes );
Return unicodebytes;
}
--------------------------------------------------------------------------------

----------------------------------------------------------------------
When the client receives data from the server, it first decodes the data one by one. The dual-byte display is normal.
----------------------------------------------------------------------

Code :--------------------------------------------------------------------------------
/**
* This method will decode the string to a recognized string
* In UI.
* @ Param datastr
* @ Return
*/
Private stringbuffer decodeunicode (final string datastr ){
Int start = 0;
Int end = 0;
Final stringbuffer buffer = new stringbuffer ();
While (Start>-1 ){
End = datastr. indexof ("// U", start + 2 );
String charstr = "";
If (END =-1 ){
Charstr = datastr. substring (start + 2, datastr. Length ());
} Else {
Charstr = datastr. substring (start + 2, end );
}
Char letter = (char) integer. parseint (charstr, 16); // hexadecimal parse integer string.
Buffer. append (new character (letter). tostring ());
Start = end;
}
Return buffer;
}
--------------------------------------------------------------------------------

----------------------------------------------------------------------
Client to server:
----------------------------------------------------------------------
The client uses the following method to encode the characters on the mobile phone end into a ISO-8859-1 and send it to the server.
----------------------------------------------------------------------

Code :--------------------------------------------------------------------------------
/**
* Write the string data
* @ Param Value
* @ Param outdata
*/
Private void writesjis (dataoutputstream outdata, string value ){
Try {
Byte [] DATA = NULL;
// Data = (value). getbytes ("UTF-8 ");
Data = (value). getbytes ("iso8859_1 ");
Outdata. writeint (data. Length );
Outdata. Write (data, 0, Data. Length );

System. Out. println ("data. Length:" + data. Length );
System. Out. println ("data. Value:" + value );
} Catch (exception ex ){
System. Out. println ("write error ");
Ex. printstacktrace ();
}
}
--------------------------------------------------------------------------------

----------------------------------------------------------------------
The server receives the client response stream, is to use the following method to convert it to the UTF-8, subsequent operations are based on UTF-8 encoding. SQL Server may have different transformations due to inner differences, so it is necessary to access the database with the specific dB internal code for corresponding processing.
----------------------------------------------------------------------

Code :--------------------------------------------------------------------------------
/**
*
* @ Param ISO
* @ Return
*/
Public static string isotoutf (final string ISO ){
String utfstring = ISO;
If (ISO! = NULL ){
Try {
Utfstring = new string (ISO. getbytes ("ISO-8859-1"), "UTF-8 ");
} Catch (unsupportedencodingexception e ){
Utfstring = ISO;
}
} Else {
Utfstring = "";
}
Return utfstring;
}

As long as the mobile phone supports Unicode gb2312 encoding, it can be displayed normally.

4. As some mobile phone simulators do not support Chinese characters (such as the Nokia 60 series), there is really no way to do it, just wait for the Chinese version to come out. Haha ,:)

My mailbox isZhaofei8009@wellhope.shYou are very welcome to discuss this issue together, and hope to be a technical friend.

Work together and make progress together!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Understanding of Chinese phone coding problems [go 〕

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support