Analysis and resolution of Chinese garbled characters in the communication between Android and PC

Source: Internet
Author: User

The initial realization of the Android and PC server communication, and encountered the legend of the headache of Chinese garbled problem. Since there is garbled, then the reason is naturally the protocol is not through.
We know that the default encoding standard in Eclipse is GBK, and the default layout file encoding standard for Android program development is Utf-8, so that when we communicate with each other, we will inevitably get garbled. To solve it, we should start with the transformation of the two.
First we know that the encoded character set of the text obtained from the Android phone interface is utf-8, so when our Java file gets it it is compiled in utf-8 form. So here, after getting the text in the EditText, it outputs it with the output stream, and the server side uses a string to receive a line of characters from the input buffer stream.

Java code
  1. Read Client Information
  2. While (true) {
  3. While (is.available () > 0) {
  4. //transcoding before
  5. String msg_client1 = Brd.readline ();
  6. System.out.println ("before transcoding:" + msg_client1);
  7. //transcoding
  8. String msg_client = new String (Msg_client1.getbytes ("GBK"),
  9. "Utf-8");
  10. String enter = new string ("\ r \ n". GetBytes ("Utf-8"));
  11. //transcoding after printing out
  12. SYSTEM.OUT.PRINTLN ("client:" + msg_client);
  13. String msg_server = msg_client1 + Enter;
  14. Os.write (Msg_server.getbytes ());
  15. Os.flush ();
  16. Thread.Sleep (1000);
  17. }
  18. }

Do not guess, at this time because the server-side encoding standard is GBK, read out the string must be garbled. So below, we'll get a byte array compiled in UTF-8 format. So, the long-awaited Chinese characters finally came out.

But there is no discovery, when the input of an even number of Chinese characters, get the correct results, and input odd number of, the last word becomes "?? ”?
What is this again? Tangled here for a long time, so to search the standard character set encoding, and finally found the reason. For GBK This encoding, the Chinese corresponds to the number of bytes is 2byte, while in Utf-8, Chinese is the corresponding three bytes. So when you enter an odd number of characters, the last byte cannot be compiled in GBK. For example, UTF-8 encoded 3 words, encoding is 123|456|789, in the transmission to the server, the system thinks that the 9 bytes is GBK encoding, so it becomes 12|34|56|78|9. The extra byte in the back cannot be compiled into Chinese characters, which is commonly used at this time. "To replace. So the last byte becomes "? "(ASCII code is 63). Then turn it into a utf-8, then become 123|456|78? At this point the third Word cannot be read correctly, so the result (two question marks) appears.
Now, how to solve it? After trying all kinds of encoding conversion, wandering in a variety of garbled between still can't solve, suddenly think, the previous reading way is through the BufferedReader line of reading, so that the input stream in the character buffer, has destroyed the last byte. So, if you get a byte array directly without destroying it, can it be solved? Change the code here, directly using the resulting byte array to form a string, and then ...

Java code
  1. Read Client Information
  2. While (true) {
  3. While (is.available () > 0) {
  4. byte[] bb = new byte[is.available ()];
  5. Is.read (BB);
  6. //transcoding before
  7. String msg_client1 = new String (BB);
  8. System.out.println ("before transcoding:" + msg_client1);
  9. //transcoding
  10. String enter = new string ("\ r \ n". GetBytes ("Utf-8"));
  11. String msg_client = new String (BB, "Utf-8");
  12. SYSTEM.OUT.PRINTLN ("client:" + msg_client);
  13. //Send
  14. String msg_server = msg_client1 + Enter;
  15. Os.write (Msg_server.getbytes ());
  16. Os.flush ();
  17. Thread.Sleep (1000);
  18. }



You can see that the odd number of Chinese characters can also be read! Yes Although this small problem tangled for a long time, but also let oneself know the coding of the secret, the harvest is very big.
Finally, explain the method of new String (Str.getbytes ("DD"), "CC"). In short, the use of this method is that when the actual encoding of your string is "CC", and the system as "DD", this line of code can be used to get the correct encoding.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.