Java string encoding and conversion

Source: Internet
Author: User

Whether the program is localized or internationalized, character encoding conversion is involved. Especially in Web applications, you often need to process Chinese characters. In this case, you need to encode and convert the string to GBK or gb2312.

1. Key Technical points:
1. popular character encoding formats include: US-ASCII, ISO-8859-1, UTF-8, UTF-16BE, UTF-16LE, UTF-16, GBK, gb2312, among which GBK, gb2312 is specialized in Chinese encoding.
2. The getbytes method of string is used to obtain the byte array of the string according to the specified encoding. The parameter specifies the decoding format. If no decoding format is specified, the system uses the default encoding format.
3. String's "string (Bytes [] BS, string charset)" constructor is used to combine byte arrays into a string object in the specified format

Ii. instance Demonstration:

 

Package book. String;

Import java. Io. unsupportedencodingexception;

/***//**
* Encode the conversion string
* @ Author Joe
*
*/

Public class changecharset ...{
/*** // *** 7-bit ASCII character, also known as the basic Latin block of the ISO646-US and Unicode Character Set */
Public static final string us_ascii = "US-ASCII ";
/*** // *** ISO Latin alphabet No.1, also known as ISO-LATIN-1 */
Public static final string iso_8859_1 = "ISO-8859-1 ";
/***** // ** 8-bit UCS conversion format */
Public static final string utf_8 = "UTF-8 ";
/*** // *** The 16-bit UCS conversion format. The big endian (the lowest address stores the high byte) byte sequence */
Public static final string utf_16be = "UTF-16BE ";
/*** // *** 16-bit UCS conversion format, litter endian (highest address storage position byte) byte order */
Public static final string utf_16le = "UTF-16LE ";
/*** // ** The 16-bit UCS conversion format. The byte sequence is identified by optional byte sequence tags */
Public static final string utf_16 = "UTF-16 ";
/***** // ** Chinese character set with a large Character Set **/
Public static final string GBK = "GBK ";

Public static final string gb2312 = "gb2312 ";

/*** // Convert the character encoding to the US-ASCII Code */
Public String toascii (string Str) throws unsupportedencodingexception ...{
Return this. changecharset (STR, us_ascii );
}

/*** // Convert character encoding to ISO-8859-1 */
Public String toiso_8859_1 (string Str) throws unsupportedencodingexception ...{
Return this. changecharset (STR, iso_8859_1 );
}

/*** // Convert character encoding to UTF-8 */
Public String toutf_8 (string Str) throws unsupportedencodingexception ...{
Return this. changecharset (STR, utf_8 );
}

/*** // Convert character encoding to UTF-16BE */
Public String toutf_16be (string Str) throws unsupportedencodingexception ...{
Return this. changecharset (STR, utf_16be );
}

/*** // Convert character encoding to UTF-16LE */
Public String toutf_16le (string Str) throws unsupportedencodingexception ...{
Return this. changecharset (STR, utf_16le );
}

/*** // Convert character encoding to UTF-16 */
Public String toutf_16 (string Str) throws unsupportedencodingexception ...{
Return this. changecharset (STR, utf_16 );
}

/***** // Convert the character encoding to GBK */
Public String togbk (string Str) throws unsupportedencodingexception ...{
Return this. changecharset (STR, GBK );
}

/***** // Convert the character encoding to gb2312 */
Public String togb2312 (string Str) throws unsupportedencodingexception ...{
Return this. changecharset (STR, gb2312 );
}

/***//**
* Implementation of string encoding conversion
* @ Param STR the string to be converted
* @ Param newcharset destination Encoding
*/
Public String changecharset (string STR, string newcharset) throws unsupportedencodingexception ...{
If (STR! = NULL )...{
// Use the default character encoding to decode the string. It is related to the system. The default value of windows in Chinese is gb2312.
Byte [] BS = Str. getbytes ();
Return new string (BS, newcharset); // encode a new character to generate a string.
}
Return NULL;
}

/***//**
* Implementation of string encoding conversion
* @ Param STR the string to be converted
* @ Param oldcharset source Character Set
* @ Param newcharset target Character Set
*/
Public String changecharset (string STR, string oldcharset, string newcharset) throws unsupportedencodingexception ...{
If (STR! = NULL )...{
// Decodes a string using the source character encoding
Byte [] BS = Str. getbytes (oldcharset );
Return new string (BS, newcharset );
}
Return NULL;
}

Public static void main (string [] ARGs) throws unsupportedencodingexception ...{
Changecharset test = new changecharset ();
String STR = "this is a Chinese string! ";
System. Out. println ("str:" + Str );

String GBK = test. togbk (STR );
System. Out. println ("converted to GBK Code:" + GBK );
System. Out. println ();

String ASCII = test. toascii (STR );
System. Out. println ("convert to US-ASCII:" + ASCII );
System. Out. println ();

String iso88591 = test. toiso_8859_1 (STR );
System. Out. println ("converted to ISO-8859-1 Code:" + iso88591 );
System. Out. println ();

GBK = test. changecharset (iso88591, iso_8859_1, GBK );
System. Out. println ("then convert the ISO-8859-1 code string into GBK Code:" + GBK );
System. Out. println ();

String utf8 = test. toutf_8 (STR );
System. Out. println ();
System. Out. println ("convert to UTF-8 Code:" + utf8 );
String utf16be = test. toutf_16be (STR );
System. Out. println ("convert to UTF-16BE Code:" + utf16be );
GBK = test. changecharset (utf16be, utf_16be, GBK );
System. Out. println ("then convert the characters encoded by the UTF-16BE to GBK:" + GBK );
System. Out. println ();

String utf16le = test. toutf_16le (STR );
System. Out. println ("convert to UTF-16LE Code:" + utf16le );
GBK = test. changecharset (utf16le, utf_16le, GBK );
System. Out. println ("then convert the UTF-16LE-encoded string to GBK:" + GBK );
System. Out. println ();

String UTF16 = test. toutf_16 (STR );
System. Out. println ("convert to UTF-16 Code:" + UTF16 );
String gb2312 = test. changecharset (UTF16, utf_16, gb2312 );
System. Out. println ("convert the UTF-16-encoded string to gb2312:" + gb2312 );
}

}
 

Output result:

 

STR: This is a Chinese string!
Convert to GBK: This is a Chinese string!

Convert to US-ASCII: This is ?????? String!

Convert to ISO-8859-1 code: This is ?????? String!

Then convert the ISO-8859-1 code string into GBK code: This is a Chinese string!

Convert to UTF-8 code: This is ????? String!
Convert to UTF-16BE code: Week? Too many ????
Then convert the character of UTF-16BE encoding into GBK code: This is a Chinese string!

Converted to UTF-16LE code: Orange ????? Craftsman's father VIII
Then convert the UTF-16LE encoded string into GBK code: This is a Chinese string!

Convert to UTF-16 code: Week? Too many ????
Then convert the UTF-16 encoded string into gb2312 code :? This is a Chinese string!
 

Iii. source code analysis:
To change the string encoding, follow these steps:
1. Call the getbyte method of string to decode the string and obtain the byte array of the string (the byte array does not carry any information about the encoding format, and only the characters have the encoding format)
2. Construct a New String object based on the byte array and the new character encoding.

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.