Analysis and Solution of Chinese garbled Characters During Android and pc communication, androidpc Chinese garbled characters
After initially implementing the communication between Android and pc servers, I encountered the legendary Chinese Garbled text problem. Since garbled characters occur, the reason is that the Protocol is disconnected.
We know that the default encoding standard in eclipse is GBK, while the default layout file encoding standard in Android development is UTF-8. As a result, garbled characters are inevitable during communication between the two parties. To solve this problem, we must begin with the transformation between the two.
First, we know that the encoding Character Set of the text obtained from the Android mobile phone interface is UTF-8. Therefore, when a java file obtains the text, it is compiled in UTF-8 format. Therefore, after obtaining the text in EditText, use the output stream to output it, and the server uses a string to receive a line of characters from the input buffer stream.
Java code
- // Read client information
- While (true ){
- While (is. available ()> 0 ){
- // Before Transcoding
- String msg_client1 = brd. readLine ();
- System. out. println ("before transcoding:" + msg_client1 );
- // Transcoding
- String msg_client = new String (msg_client1.getBytes ("gbk "),
- "UTF-8 ");
- String enter = new String ("\ r \ n". getBytes ("UTF-8 "));
- // Print out after Transcoding
- System. out. println ("client:" + msg_client );
- String msg_server = msg_client1 + enter;
- OS. write (msg_server.getBytes ());
- OS. flush ();
- Thread. sleep (1000 );
- }
- }
You don't have to guess. Because the encoding standard on the server side is GBK, the read string must be garbled. As a result, we will compile the byte array in UTF-8 format. So the long-awaited Chinese characters finally came out.
But have you found that when you enter an even number of Chinese characters, the result is correct. When you enter an odd number, the last word is changed to "???"?
Why? After a long time, I searched for the character set encoding standard and finally found the cause. For gbk encoding, the number of bytes corresponding to Chinese characters is 2 bytes. In UTF-8, Chinese characters correspond to three bytes. Therefore, when an odd number of Chinese characters are input, the last byte cannot be compiled in gbk. For example, the UTF-8 encoding is 123 | 456 | 789. When transmitted to the server, the system considers the nine bytes as gbk encoding, so it becomes 12 | 34 | 56 | 78 | 9. One additional byte cannot be compiled into Chinese characters. . So the last byte becomes "?" (The ASCII code is 63 ). When it is converted to UTF-8, it is converted to 123 | 456 | 78 ?. At this time, the third word cannot be correctly read, so the result (two question marks) appears ).
How can we solve this problem? When I tried to convert various encodings and found that all sorts of garbled characters still could not be solved, I suddenly thought that the previous reading method was to read one row through bufferedreader, in this way, the last byte is destroyed when the input stream is in the character buffer. So, if we directly get the byte array without destroying it, can we solve it? Here I changed the code and directly used the byte array to form a String, and then ......
Java code
- // Read client information
- While (true ){
- While (is. available ()> 0 ){
- Byte [] bb = new byte [is. available ()];
- Is. read (bb );
- // Before Transcoding
- String msg_client1 = new String (bb );
- System. out. println ("before transcoding:" + msg_client1 );
- // Transcoding
- String enter = new String ("\ r \ n". getBytes ("UTF-8 "));
- String msg_client = new String (bb, "UTF-8 ");
- System. out. println ("client:" + msg_client );
- // Send
- String msg_server = msg_client1 + enter;
- OS. write (msg_server.getBytes ());
- OS. flush ();
- Thread. sleep (1000 );
- }
As you can see, the odd Chinese characters can also be read! (Yeah !) Although I have been entangled in this small problem for a long time, I have learned a lot about the encoding.
Finally, explain the new String (str. getBytes ("dd"), "cc") method. Simply put, this method is useful when the actual encoding of your string is "CC" and the system regards it as "DD", you can use this line of code to get the correct encoding.
Help: the PC communicates with the android simulator socket, and Chinese characters are garbled.
Read:
BufferedReader in = new BufferedReader (new InputStreamReader (socket. getInputStream (), "GBK "));
Send:
OutputStream out = socket. getOutputStream ();
Out. write (head. getBytes ("GBK "));
How can I solve the garbled problem during socket communication between Android and PC?
We recommend that you ensure that the same encoding is used for socket communication. We recommend that you use UTF-8 in a unified manner ~~ The servlet using java on my pc is OK, and c ++ won't do it. I don't know if this is the reason.