Analysis and resolution of Chinese garbled characters in the communication between Android and PC

Last Update:2014-07-24 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The initial realization of the Android and PC server communication, and encountered the legend of the headache of Chinese garbled problem. Since there is garbled, then the reason is naturally the protocol is not through.
We know that the default encoding standard in Eclipse is GBK, and the default layout file encoding standard for Android program development is Utf-8, so that when we communicate with each other, we will inevitably get garbled. To solve it, we should start with the transformation of the two.
First we know that the encoded character set of the text obtained from the Android phone interface is utf-8, so when our Java file gets it it is compiled in utf-8 form. So here, after getting the text in the EditText, it outputs it with the output stream, and the server side uses a string to receive a line of characters from the input buffer stream.

Java code

Read Client Information
While (true) {
While (is.available () > 0) {
//transcoding before
String msg_client1 = Brd.readline ();
System.out.println ("before transcoding:" + msg_client1);
//transcoding
String msg_client = new String (Msg_client1.getbytes ("GBK"),
"Utf-8");
String enter = new string ("\ r \ n". GetBytes ("Utf-8"));
//transcoding after printing out
SYSTEM.OUT.PRINTLN ("client:" + msg_client);
String msg_server = msg_client1 + Enter;
Os.write (Msg_server.getbytes ());
Os.flush ();
Thread.Sleep (1000);
}
}

Do not guess, at this time because the server-side encoding standard is GBK, read out the string must be garbled. So below, we'll get a byte array compiled in UTF-8 format. So, the long-awaited Chinese characters finally came out.

But there is no discovery, when the input of an even number of Chinese characters, get the correct results, and input odd number of, the last word becomes "?? ”？
What is this again? Tangled here for a long time, so to search the standard character set encoding, and finally found the reason. For GBK This encoding, the Chinese corresponds to the number of bytes is 2byte, while in Utf-8, Chinese is the corresponding three bytes. So when you enter an odd number of characters, the last byte cannot be compiled in GBK. For example, UTF-8 encoded 3 words, encoding is 123|456|789, in the transmission to the server, the system thinks that the 9 bytes is GBK encoding, so it becomes 12|34|56|78|9. The extra byte in the back cannot be compiled into Chinese characters, which is commonly used at this time. "To replace. So the last byte becomes "? "(ASCII code is 63). Then turn it into a utf-8, then become 123|456|78? At this point the third Word cannot be read correctly, so the result (two question marks) appears.
Now, how to solve it? After trying all kinds of encoding conversion, wandering in a variety of garbled between still can't solve, suddenly think, the previous reading way is through the BufferedReader line of reading, so that the input stream in the character buffer, has destroyed the last byte. So, if you get a byte array directly without destroying it, can it be solved? Change the code here, directly using the resulting byte array to form a string, and then ...

Java code

Read Client Information
While (true) {
While (is.available () > 0) {
byte[] bb = new byte[is.available ()];
Is.read (BB);
//transcoding before
String msg_client1 = new String (BB);
System.out.println ("before transcoding:" + msg_client1);
//transcoding
String enter = new string ("\ r \ n". GetBytes ("Utf-8"));
String msg_client = new String (BB, "Utf-8");
SYSTEM.OUT.PRINTLN ("client:" + msg_client);
//Send
String msg_server = msg_client1 + Enter;
Os.write (Msg_server.getbytes ());
Os.flush ();
Thread.Sleep (1000);
}

You can see that the odd number of Chinese characters can also be read! Yes Although this small problem tangled for a long time, but also let oneself know the coding of the secret, the harvest is very big.
Finally, explain the method of new String (Str.getbytes ("DD"), "CC"). In short, the use of this method is that when the actual encoding of your string is "CC", and the system as "DD", this line of code can be used to get the correct encoding.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Analysis and resolution of Chinese garbled characters in the communication between Android and PC

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Analysis and resolution of Chinese garbled characters in the communication between Android and PC

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support