Java character format

Source: Internet
Author: User

Http://blog.chinaunix.net/uid-12348673-id-3335300.html

http://blog.csdn.net/zhouyong80/article/details/1900100
Whether the program is localized or internationalized, it involves the conversion of character encoding. In particular, in Web applications where Chinese characters are often required, it is necessary to convert the string encoding to GBK or GB2312.

First, key technical points:
1, the current popular character encoding format has: Us-ascii, iso-8859-1, UTF-8, Utf-16be, Utf-16le, UTF-16, GBK, GB2312, GBK, GB2312 is specialized in the processing of Chinese code.
2. The GetBytes method of string is used to get the byte array of the string by the specified encoding, the parameter specifies the decoding format, and if no decoding format is specified, the system default encoding format.
3. String "String (bytes[] BS, string charset)" constructs a method to combine a byte array into a string object in the specified format

Second, the example demonstration:

Package book. String;

Import java.io.UnsupportedEncodingException;

/**
* Encoding of converted Strings
* @author Joe
*
*/

public class Changecharset{
/** 7-bit ASCII character, also known as the basic Latin block of the iso646-us, Unicode character set */
public static final String us_ascii = "Us-ascii";
/** ISO Latin alphabet, also known as iso-latin-1 */
public static final String iso_8859_1 = "Iso-8859-1";
/** 8-bit UCS conversion format */
public static final String utf_8 = "UTF-8";
/** 16-bit UCS conversion format, Big Endian (lowest address holds high byte) byte order */
public static final String utf_16be = "Utf-16be";
/** 16-bit UCS conversion format, litter Endian (highest address storage byte) byte order */
public static final String Utf_16le = "Utf-16le";
/** 16-bit UCS conversion format, byte order identified by an optional byte order mark */
public static final String utf_16 = "UTF-16";
/** Chinese Super large character set **/
public static final String GBK = "GBK";

public static final String GB2312 = "GB2312";

/** convert character encoding to US-ASCII code */
public string Toascii (String str) throws Unsupportedencodingexception{
return This.changecharset (str, US_ASCII);
}

/** convert character encoding to ISO-8859-1 */
public string toiso_8859_1 (String str) throws Unsupportedencodingexception{
return This.changecharset (str, iso_8859_1);
}

/** convert character encoding to UTF-8 */
public string Toutf_8 (String str) throws Unsupportedencodingexception{
return This.changecharset (str, utf_8);
}

/** convert character encoding to UTF-16BE */
public string Toutf_16be (String str) throws Unsupportedencodingexception{
return This.changecharset (str, UTF_16BE);
}

/** convert character encoding to Utf-16le */
public string Toutf_16le (String str) throws Unsupportedencodingexception{
return This.changecharset (str, utf_16le);
}

/** convert character encoding to UTF-16 */
public string toutf_16 (String str) throws Unsupportedencodingexception{
return This.changecharset (str, utf_16);
}

/** convert character encoding to GBK */
public string TOGBK (String str) throws Unsupportedencodingexception{
return This.changecharset (str, GBK);
}

/** convert character encoding to GB2312 */
public string toGB2312 (String str) throws Unsupportedencodingexception{
Return This.changecharset (str,gb2312);
}

/**
* Implementation method of string encoding conversion
* @param str string to be converted
* @param newcharset Target Code
*/
public string Changecharset (String str, string newcharset) throws Unsupportedencodingexception{
if (str! = NULL){
Decodes a string with the default character encoding. System-related, Chinese Windows defaults to GB2312
byte[] bs = Str.getbytes ();
return new String (BS, Newcharset); Generate a string with a new character encoding
}
return null;
}

/**
* Implementation method of string encoding conversion
* @param str string to be converted
* @param oldcharset Source Character Set
* @param newcharset Target Character Set
*/
public string Changecharset (String str, string oldcharset, String newcharset) throws Unsupportedencodingexception{
if (str! = NULL){
decoding strings with source character encoding
byte[] bs = str.getbytes (Oldcharset);
return new String (BS, Newcharset);
}
return null;
}

public static void Main (string[] args) throws Unsupportedencodingexception{
Changecharset test = new Changecharset ();
String str = "This is a Chinese string!";
System.out.println ("str:" + str);

String GBK = TEST.TOGBK (str);
System.out.println ("converted into GBK code:" + GBK);
System.out.println ();

String ASCII = test.toascii (str);
System.out.println ("Convert to Us-ascii:" + ASCII);
System.out.println ();

String iso88591 = test.toiso_8859_1 (str);
System.out.println ("converted into Iso-8859-1 code:" + iso88591);
System.out.println ();

GBK = Test.changecharset (iso88591, Iso_8859_1, GBK);
System.out.println ("then convert the Iso-8859-1 code string into GBK code:" + GBK);
System.out.println ();

String UTF8 = test.toutf_8 (str);
System.out.println ();
System.out.println ("converted into UTF-8 code:" + UTF8);
String utf16be = test.toutf_16be (str);
System.out.println ("converted into Utf-16be code:" + utf16be);
GBK = Test.changecharset (Utf16be, Utf_16be, GBK);
System.out.println ("then convert the UTF-16BE encoded character into a GBK code:" + GBK);
System.out.println ();

String utf16le = Test.toutf_16le (str);
System.out.println ("converted into Utf-16le code:" + Utf16le);
GBK = Test.changecharset (Utf16le, Utf_16le, GBK);
System.out.println ("then convert the Utf-16le encoded string into a GBK code:" + GBK);
System.out.println ();

String UTF16 = test.toutf_16 (str);
System.out.println ("converted into UTF-16 code:" + UTF16);
String gb2312 = Test.changecharset (Utf16, Utf_16, GB2312);
System.out.println ("then convert the UTF-16 encoded string into a GB2312 code:" + gb2312);
}

}

Output Result:

Str:this is a Chinese string!
Convert to GBK code: This is a Chinese string!

Convert to Us-ascii:this is a?????? string!

Convert to Iso-8859-1 code: this is a?????? string!

Then convert the Iso-8859-1 code string into a GBK code: This is a Chinese string!


Convert to UTF-8 code: this is a????? string!
Convert to Utf-16be code: Zhou dealing? 猠 get mad bland???? 瑲 Chu Stamina
Then convert the UTF-16BE encoded character into a GBK code: This is a Chinese string!

Convert to Utf-16le code: Orange 獩 椠????? Carpenter contradiction 湩 Ⅷ
Then convert the Utf-16le encoded string into a GBK code: This is a Chinese string!

Convert to UTF-16 code: Zhou dealing? 猠 get mad bland???? 瑲 Chu Stamina
Then convert the UTF-16 encoded string into a GB2312 code:? This is a Chinese string!

Third, source code analysis:
To change the string encoding, follow these steps:
1. Call string's GetByte method to decode the strings, get the byte array of the string (byte array does not carry any information about the encoding format, only the characters have encoded format)
2. Constructs a new string object based on a byte array and a new character encoding, resulting in a string that is generated according to the new character encoding

Java character format

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.