Java String Encoding and conversion program code

Source: Internet
Author: User
Tags character set

1. The popular character encoding formats include: US-ASCII, ISO-8859-1, UTF-8, UTF-16BE, UTF-16LE, UTF-16, GBK, GB2312, etc.

GBK and GB2312 are specially used for Chinese encoding.
2. The getBytes method of String is used to obtain the byte array of the String according to the specified encoding. The parameter specifies the decoding format. If no decoding format is specified

The default encoding format is used.
3. String's "String (bytes [] bs, String charset)" constructor is used to combine byte arrays into a String object in the specified format

The code is as follows: Copy code

Package com. using. test;
Import java. io. UnsupportedEncodingException;

/**
* Encode the conversion string
*/
Public class ConverStr {
 
Public static void main (String [] args) throws UnsupportedEncodingException {
ConverStr test = new ConverStr ();
String str = "u9519u8befu7684u6765u6e90 ";
  
System. out. println ("str:" + str );
String gbk = test. toGBK (str );
System. out. println ("converted to GBK code:" + gbk );
System. out. println ();
String ascii = test. toASCII (str );
System. out. println ("converted to US-ASCII:" + ascii );
Gbk = test. changeCharset (ascii, ConverStr. US_ASCII, ConverStr. GBK );
System. out. println ("convert the ASCII code string to GBK:" + gbk );
System. out. println ();
String iso88591 = test. toISO_8859_1 (str );
System. out. println ("converted to ISO-8859-1 code:" + iso88591 );
Gbk = test. changeCharset (iso88591, ConverStr. ISO_8859_1, ConverStr. GBK );
System. out. println ("then convert the ISO-8859-1 code string into GBK code:" + gbk );
System. out. println ();
String utf8 = test. toUTF_8 (str );
System. out. println ("convert to UTF-8 code:" + utf8 );
Gbk = test. changeCharset (utf8, ConverStr. UTF_8, ConverStr. GBK );
System. out. println ("then convert the UTF-8 code string into GBK code:" + gbk );
System. out. println ();
String utf16be = test. toUTF_16BE (str );
System. out. println ("convert to UTF-16BE code:" + utf16be );
Gbk = test. changeCharset (utf16be, ConverStr. UTF_16BE, ConverStr. GBK );
System. out. println ("then convert the UTF-16BE code string into GBK code:" + gbk );
System. out. println ();
String utf16le = test. toUTF_16LE (str );
System. out. println ("convert to UTF-16LE code:" + utf16le );
Gbk = test. changeCharset (utf16le, ConverStr. UTF_16LE, ConverStr. GBK );
System. out. println ("then convert the UTF-16LE code string into GBK code:" + gbk );
System. out. println ();
String utf16 = test. toUTF_16 (str );
System. out. println ("convert to UTF-16 code:" + utf16 );
Gbk = test. changeCharset (utf16, ConverStr. UTF_16LE, ConverStr. GBK );
System. out. println ("then convert the UTF-16 code string into GBK code:" + gbk );
String s = new String ("Chinese". getBytes ("UTF-8"), "UTF-8 ");
System. out. println (s );
 }
 
/** 7-bit ASCII characters, also known as the basic Latin block of the ISO646-US and Unicode character set */
Public static final String US_ASCII = "US-ASCII ";

/** ISO Latin alphabet No.1, also known as ISO-LATIN-1 */
Public static final String ISO_8859_1 = "ISO-8859-1 ";

/** Convert 8-bit UCS */
Public static final String UTF_8 = "UTF-8 ";

/** 16-bit UCS conversion format, Big Endian (the lowest address stores high byte) byte order */
Public static final String UTF_16BE = "UTF-16BE ";

/** 16-bit UCS conversion format, Little-endian (the highest address stores low byte) byte order */
Public static final String UTF_16LE = "UTF-16LE ";

/** The 16-bit UCS conversion format. The byte sequence is identified by optional byte sequence tags */
Public static final String UTF_16 = "UTF-16 ";

/** Chinese character set */
Public static final String GBK = "GBK ";

/**
* Convert character encoding into US-ASCII code
*/
Public String toASCII (String str) throws UnsupportedEncodingException {
Return this. changeCharset (str, US_ASCII );
 }

/**
* Convert character encoding into ISO-8859-1 code
*/
Public String toISO_8859_1 (String str) throws UnsupportedEncodingException {
Return this. changeCharset (str, ISO_8859_1 );
 }

/**
* Convert character encoding into UTF-8 code
*/
Public String toUTF_8 (String str) throws UnsupportedEncodingException {
Return this. changeCharset (str, UTF_8 );
 }

/**
* Convert character encoding into UTF-16BE code
*/
Public String toUTF_16BE (String str) throws UnsupportedEncodingException {
Return this. changeCharset (str, UTF_16BE );
 }

/**
* Convert character encoding into UTF-16LE code
*/
Public String toUTF_16LE (String str) throws UnsupportedEncodingException {
Return this. changeCharset (str, UTF_16LE );
 }

/**
* Convert character encoding into UTF-16 code
*/
Public String toUTF_16 (String str) throws UnsupportedEncodingException {
Return this. changeCharset (str, UTF_16 );
 }

/**
* Convert character encoding to GBK code
*/
Public String toGBK (String str) throws UnsupportedEncodingException {
Return this. changeCharset (str, GBK );
 }

/**
* Implementation of string encoding conversion
  *
* @ Param str
* String to be converted
* @ Param newCharset
* Target encoding
* @ Return
* @ Throws UnsupportedEncodingException
*/
Public String changeCharset (String str, String newCharset) throws UnsupportedEncodingException

{
If (str! = Null ){
// Use the default character encoding to decode the string.
Byte [] bs = str. getBytes ();
// Generate a string encoded with a new character
Return new String (bs, newCharset );
  }
Return null;
 }

/**
* Implementation of string encoding conversion
  *
* @ Param str
* String to be converted
* @ Param oldCharset
* Original encoding
* @ Param newCharset
* Target encoding
* @ Return
* @ Throws UnsupportedEncodingException
*/
Public String changeCharset (String str, String oldCharset, String newCharset) throws

UnsupportedEncodingException {
If (str! = Null ){
// Encode and decode the string with the old character. An exception may occur during decoding.
Byte [] bs = str. getBytes (oldCharset );
// Generate a string encoded with a new character
Return new String (bs, newCharset );
  }
Return null;
 }
}

Example 2

 

The code is as follows: Copy code

Import java. io. UnsupportedEncodingException;
/**
* Encode the conversion string
*/
Public class ChangeCharset {
/** 7-bit ASCII characters, also known as the basic Latin block of the ISO646-US and Unicode character set */
Public static final String US_ASCII = "US-ASCII ";
/** ISO Latin alphabet No.1, also known as ISO-LATIN-1 */
Public static final String ISO_8859_1 = "ISO-8859-1 ";
/** Convert 8-bit UCS */
Public static final String UTF_8 = "UTF-8 ";
/** 16-bit UCS conversion format, Big Endian (the lowest address stores high byte) byte order */
Public static final String UTF_16BE = "UTF-16BE ";
/** 16-bit UCS conversion format, Little-endian (the highest address stores low byte) byte order */
Public static final String UTF_16LE = "UTF-16LE ";
/** The 16-bit UCS conversion format. The byte sequence is identified by optional byte sequence tags */
Public static final String UTF_16 = "UTF-16 ";
/** Chinese character set */
Public static final String GBK = "GBK ";
/**
* Convert character encoding into US-ASCII code
*/
Public String toASCII (String str) throws UnsupportedEncodingException {
Return this. changeCharset (str, US_ASCII );
}
/**
* Convert character encoding into ISO-8859-1 code
*/
Public String toISO_8859_1 (String str) throws UnsupportedEncodingException {
Return this. changeCharset (str, ISO_8859_1 );
}
/**
* Convert character encoding into UTF-8 code
*/
Public String toUTF_8 (String str) throws UnsupportedEncodingException {
Return this. changeCharset (str, UTF_8 );
}
/**
* Convert character encoding into UTF-16BE code
*/
Public String toUTF_16BE (String str) throws UnsupportedEncodingException {
Return this. changeCharset (str, UTF_16BE );
}
/**
* Convert character encoding into UTF-16LE code
*/
Public String toUTF_16LE (String str) throws UnsupportedEncodingException {
Return this. changeCharset (str, UTF_16LE );
}
/**
* Convert character encoding to UTF-16 code Source: www.examda.com
*/
Public String toUTF_16 (String str) throws UnsupportedEncodingException {
Return this. changeCharset (str, UTF_16 );
}

/**
* Convert character encoding to GBK code
*/
Public String toGBK (String str) throws UnsupportedEncodingException {
Return this. changeCharset (str, GBK );
}
/**
* Implementation of string encoding conversion
* @ Param str string to be converted
* @ Param newCharset target code Source: Exam-level beauty editors
* @ Return
* @ Throws UnsupportedEncodingException
*/
Public String changeCharset (String str, String newCharset)
Throws UnsupportedEncodingException {
If (str! = Null ){
// Use the default character encoding to decode the string.
Byte [] bs = str. getBytes ();
// Generate a string encoded with a new character
Return new String (bs, newCharset );
}
Return null;
}
/**
* Implementation of string encoding conversion
* @ Param str string to be converted
* @ Param oldCharset original encoding
* @ Param newCharset target code Source: Exam-level beauty editors
* @ Return
* @ Throws UnsupportedEncodingException
*/
Public String changeCharset (String str, String oldCharset, String newCharset)
Throws UnsupportedEncodingException {
If (str! = Null ){
// Encode and decode the string with the old character. An exception may occur during decoding.
Byte [] bs = str. getBytes (oldCharset );
// Generate a string encoded with a new character
Return new String (bs, newCharset );
}
Return null;
}
Public static void main (String [] args) throws UnsupportedEncodingException {
ChangeCharset test = new ChangeCharset ();
String str = "This is a Chinese String! ";
System. out. println ("str:" + str );
String gbk = test. toGBK (str );
System. out. println ("converted to GBK code:" + gbk );
System. out. println ();
String ascii = test. toASCII (str );
System. out. println ("converted to US-ASCII:" + ascii );
Gbk = test. changeCharset (ascii, ChangeCharset. US_ASCII, ChangeCharset. GBK );
System. out. println ("convert the ASCII code string to GBK:" + gbk );
System. out. println ();
String iso88591 = test. toISO_8859_1 (str );
System. out. println ("converted to ISO-8859-1 code:" + iso88591 );
Gbk = test. changeCharset (iso88591, ChangeCharset. ISO_8859_1, ChangeCharset. GBK );
System. out. println ("then convert the ISO-8859-1 code string into GBK code:" + gbk );
System. out. println ();
String utf8 = test. toUTF_8 (str );
System. out. println ("convert to UTF-8 code:" + utf8 );
Gbk = test. changeCharset (utf8, ChangeCharset. UTF_8, ChangeCharset. GBK );
System. out. println ("then convert the UTF-8 code string into GBK code:" + gbk );
System. out. println ();
String utf16be = test. toUTF_16BE (str );
System. out. println ("convert to UTF-16BE code:" + utf16be );
Gbk = test. changeCharset (utf16be, ChangeCharset. UTF_16BE, ChangeCharset. GBK );
System. out. println ("then convert the UTF-16BE code string into GBK code:" + gbk );

Do not take the following steps.

The code is as follows: Copy code

ResultSet rs;
String str = rs. getString ();
Str = new String (str. getBytes ("ISO8859-1"), "gb2312 ");

The efficiency of this encoding conversion method is low. The reason for this is that when the getString () method is executed by the ResultSet, the data encoding in the database is used by default.

The method is ISO8859-1. The system converts the data to unicode according to the encoding method of the ISO8859-1. Use str. getBytes ("ISO8859-1") to count

Restore data, and then use new String (bytes, "gb2312") to convert data from gb2312 to unicode. There are many steps in the process.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.