String encoding conversion for Java string applications

Source: Internet
Author: User

Whether the program is localized or internationalized, character encoding conversion is involved. Especially in Web applications, you often need to process Chinese characters. In this case, you need to encode and convert the string to GBK or gb2312.

1. Key Technical points:

1. popular character encoding formats include: US-ASCII, ISO-8859-1, UTF-8, UTF-16BE, UTF-16LE, UTF-16, GBK, gb2312, among which GBK, gb2312 is specialized in Chinese encoding.

2. The getbytes method of string is used to obtain the byte array of the string according to the specified encoding. The parameter specifies the decoding format. If no decoding format is specified, the system uses the default encoding format.

3. String's "string (Bytes [] BS, string charset)" constructor is used to combine byte arrays into a string object in the specified format

Ii. instance Demonstration:

Package book. String;

Import java. Io. unsupportedencodingexception;

/**

* Encode the conversion string

* @ Author Joe

*

*/

Public class changecharset {

/** 7-bit ASCII characters, also known as the basic Latin block of the ISO646-US and Unicode Character Set */

Public static final string us_ascii = US-ASCII ";

/** ISO Latin alphabet No.1, also called ISO-LATIN-1 */

Public static final string iso_8859_1 = "ISO-8859-1 ";

/** Convert 8-bit UCS */

Public static final string utf_8 = UTF-8 ";

/** 16-bit UCS conversion format, big endian (the lowest address stores high byte) byte order */

Public static final string utf_16be = UTF-16BE ";

/** 16-bit UCS conversion format, litter endian (highest address storage position byte) byte order */

Public static final string utf_16le = UTF-16LE ";

/** The 16-bit UCS conversion format. The byte sequence is identified by optional byte sequence tags */

Public static final string utf_16 = UTF-16 ";

/** Chinese Character Set **/

Public static final string GBK = "GBK ";

Public static final string gb2312 = "gb2312 ";

/** Convert character encoding to US-ASCII Code */

Public String toascii (string Str) throws unsupportedencodingexception {

Return this. changecharset (STR, us_ascii );

}

/** Convert character encoding to ISO-8859-1 */

Public String toiso_8859_1 (string Str) throws unsupportedencodingexception {

Return this. changecharset (STR, iso_8859_1 );

}

/** Convert character encoding to UTF-8 */

Public String toutf_8 (string Str) throws unsupportedencodingexception {

Return this. changecharset (STR, utf_8 );

}

/** Convert character encoding to UTF-16BE */

Public String toutf_16be (string Str) throws unsupportedencodingexception {

Return this. changecharset (STR, utf_16be );

}

/** Convert character encoding to UTF-16LE */

Public String toutf_16le (string Str) throws unsupportedencodingexception {

Return this. changecharset (STR, utf_16le );

}

/** Convert character encoding to UTF-16 */

Public String toutf_16 (string Str) throws unsupportedencodingexception {

Return this. changecharset (STR, utf_16 );

}

/** Convert character encoding to GBK */

Public String togbk (string Str) throws unsupportedencodingexception {

Return this. changecharset (STR, GBK );

}

/** Convert the character encoding to gb2312 */

Public String togb2312 (string Str) throws unsupportedencodingexception {

Return this. changecharset (STR, gb2312 );

}

/**

 

Bytes ------------------------------------------------------------------------------------------------------------------

The string class in Java is encoded according to Unicode. When string (byte [] bytes, string encoding) is used to construct a string, encoding refers to the encoding method of the data in bytes, rather than the encoding method of the final string. In other words, enables the system to convert bytes data from encoding to unicode encoding. If this parameter is not specified, JDK determines the bytes encoding method based on the operating system.

When we read data from a file, it is best to use the inputstream method, and then use string (byte [] bytes, string encoding) to specify the file encoding method. Do not use reader because reader automatically converts the file content to unicode encoding Based on the encoding method specified by JDK.

When we read text data from the database, we use the resultset. getbytes () method to obtain the byte array. We can also use the encoded string construction method.

Resultset RS;
Bytep [] bytes = Rs. getbytes ();
String STR = new string (bytes, "gb2312 ");

Do not take the following steps.

Resultset RS;
String STR = Rs. getstring ();
STR = new string (Str. getbytes ("iso8859-1"), "gb2312 ");

The efficiency of this encoding conversion method is low. The reason for doing so is that when the resultset is executed by the getstring () method, the data encoding method in the database is iso8859-1 by default. The system converts the data to Unicode according to the encoding method of the iso8859-1. Use Str. getbytes ("iso8859-1") to restore the data, and then use new string (bytes, "gb2312") to convert the data from gb2312 to Unicode, there are many steps in the middle.

When reading parameters from httprequest, you can use the reqeust. setcharacterencoding () method to set the encoding method. The read content is correct.

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.