Java Unicode goto GBK

Source: Internet
Author: User

We often encounter coding problems. Java is known as the international language because its class file is UTF-8, and the JVM runs with UTF-16 (as for why the JVM uses UTF-16, I have not read the relevant data, but I guess it is because Java is a character (char) is a 16-bit, UTF-16 is a double-byte encoding, which is Unicode encoding.


The goal of Unicode is to support all character sets in the world, meaning that almost all character sets contain characters that have corresponding encodings in Unicode. In Unicode, the mapping of characters to code is the Unicode character set, called the UCS (Unicode Character set), and each Unicode character encoding is called a code point. )。 UTF-8 and UTF-16 are different UCS encoding methods, UTF is UCS transformation Format.;


In Java, the GetBytes () method of string is to encode a specific string (Unicode) according to a given character set (encode), and new string () to swap byte streams back to Unicode (decode) in a character set. Every string in Java is Unicode encoded.


Again to see the page, if you do not do special processing, the submission of the form according to the ContentType settings in the page character set encoding conversion, sent to the background, the background must use Req.setcharacterencoding to specify the parameters of the encoding format ( Different application servers should be specified in different ways to decode correctly.


Java encode and decode are all relative to Unicode, encode means will char[]--xxx Encoding byte[],decode is by xxx Encoding byte[]-- Char[]. Normally, when we say "convert GBK code to UTF-8 code", the actual meaning is: GBK Encoding byte[]--UTF-8 Encoding byte[], this conversion only when the need to use byte[] transfer data, it is meaningful, Otherwise there is no point.


The first point to note is that the string object in Java is a Unicode-encoded string.


However, we usually hear someone say, "We need to convert string from iso-8859-1 to GBK code", what's going on? In fact, we are not going to "convert a string encoded by iso-8859-1 into a GBK encoded string", and it is repeatedly stated that the string in Java is Unicode encoded, so there is no "iso-8859-1 encoded string" or The phrase "GBK encoded string". The only reason for the conversion is that the string was incorrectly encoded. We often encounter the need to convert from iso-8859-1 to such things as gbk/utf-8 and so on. The so-called conversion process is: String---byte[]-->string.


Perhaps you know very well the code for this process: New String (Text.getbytes ("iso-8859-1"), "GBK"). But it's not that simple to really understand. On the surface it seems easy to understand, not just to encode the text string object as iso-8859-1 as byte[] and then convert it to a string in the GBK way? But this code can easily be misunderstood as: "Converting text string from iso-8859-1 to GBK encoding" is wrong. Have you ever seen this code: new String (Text.getbytes ("GBK"), "UTF-8") to encode a string for conversion?


You will often see new String (Text.getbytes ("iso-8859-1"), "GBK" as the code, because a GBK byte stream is incorrectly converted to String (Unicode) in iso-8859-1 way! The most common place where this happens is when a GBK-encoded webpage submits data to the background, it is possible to see this code appear. The GBK stream is incorrectly treated as a iso8859-1 stream, so it gets a wrong string. Since Iso8859-1 is a single-byte encoding, each byte is converted to a string as is, that is, although this is a wrong conversion, the encoding does not change, so we still have the opportunity to convert the code back! So the classic new String (Text.getbytes ("iso-8859-1"), "GBK", appears.


If the system is mistaken for another encoding format, it is possible to convert it back again, because the encoding conversion is not as simple as negative negative.



public class unicode2gb{public static void Main (string[] arg) {String str = "\u53d6             ";         System.out.println (str); }     }


The output is automatically converted to GB code, and it is also possible to add a conversion:

 public   class   Unicode2GB{          public   static   void   main (String[]    ARG) {           try{                String   str    =    "\u53d6";              str   =   new   string (Str.getbytes (), "gb2312");              system.out.println (str);            }catch (java.io.unsupportedencodingexception    e) {           }                }     }} 


Java Unicode goto GBK

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.