Various java string encoding and conversion

Source: Internet
Author: User

Various java string encoding and conversion

Import java. io. unsupportedEncodingException;/*** encode the conversion string */public class ChangeCharset {/** 7-bit ASCII character, it is also known as the basic Latin block of the ISO646-US, Unicode Character Set */public static final String US_ASCII = US-ASCII;/** ISO Latin alphabet No.1, also known as ISO-LATIN-1 */public static final String ISO_8859_1 = ISO-8859-1;/** 8-bit UCS conversion format */public static final String UTF_8 = UTF-8;/** 16-bit UCS conversion format, big Endian (the lowest address stores high bytes) byte order */public static fi Nal String UTF_16BE = UTF-16BE;/** 16-bit UCS conversion format, Little-endian (maximum address to store low byte) byte order */public static final String UTF_16LE = UTF-16LE; /** 16-bit UCS conversion format, the byte order marked by optional byte order */public static final String UTF_16 = UTF-16; /** Chinese Character Set */public static final String GBK = GBK;/*** convert character encoding to US-ASCII Code */public String toASCII (String str) throws UnsupportedEncodingException {return this. changeCharset (str, US_A SCII);}/*** convert character encoding to ISO-8859-1 Code */public String toISO_8859_1 (String str) throws UnsupportedEncodingException {return this. changeCharset (str, ISO_8859_1);}/*** convert character encoding to a UTF-8 Code */public String toUTF_8 (String str) throws UnsupportedEncodingException {return this. changeCharset (str, UTF_8);}/*** convert character encoding to UTF-16BE Code */public String toUTF_16BE (String str) throws UnsupportedEncodingException {ret Urn this. changeCharset (str, UTF_16BE);}/*** convert character encoding to a UTF-16LE Code */public String toUTF_16LE (String str) throws UnsupportedEncodingException {return this. changeCharset (str, UTF_16LE);}/*** convert character encoding to a UTF-16 Code */public String toUTF_16 (String str) throws UnsupportedEncodingException {return this. changeCharset (str, UTF_16);}/*** convert character encoding to GBK Code */public String toGBK (String str) throws UnsupportedEnc OdingException {return this. changeCharset (str, GBK );} /*** Implementation Method for String encoding * @ param str the String to be converted * @ param newCharset destination encoding * @ return * @ throws UnsupportedEncodingException */public String changeCharset (String str, string newCharset) throws UnsupportedEncodingException {if (str! = Null) {// decodes a string using the default character encoding. Byte [] bs = str. getBytes (); // use the new character encoding to generate the String return new String (bs, newCharset);} return null ;} /*** Implementation Method of string encoding * @ param str string to be converted * @ param oldCharset original encoding * @ param newCharset destination encoding * @ return * @ throws UnsupportedEncodingException */ public String changeCharset (String str, string oldCharset, String newCharset) throws UnsupportedEncodingException {if (str! = Null) {// encode the decoded string with the old character. An exception may occur during decoding. Byte [] bs = str. getBytes (oldCharset); // use the new character encoding to generate the String return new String (bs, newCharset);} return null;} public static void main (String [] args) throws UnsupportedEncodingException {ChangeCharset test = new ChangeCharset (); String str = This is a Chinese String !; System. out. println (str: + str); String gbk = test. toGBK (str); System. out. println (converted to GBK code: + gbk); System. out. println (); String ascii = test. toASCII (str); System. out. println (convert to US-ASCII code: + ascii); gbk = test. changeCharset (ascii, ChangeCharset. US_ASCII, ChangeCharset. GBK); System. out. println (then convert the ASCII string to GBK: + gbk); System. out. println (); String iso88591 = test. toISO_8859_1 (str); System. out. println (convert to ISO-8859-1 code: + iso88591); gbk = test. changeCharset (iso88591, ChangeCharset. ISO_8859_1, ChangeCharset. GBK); System. out. println (then convert the string of the ISO-8859-1 code into GBK code: + gbk); System. out. println (); String utf8 = test. toUTF_8 (str); System. out. println (convert to UTF-8 code: + utf8); gbk = test. changeCharset (utf8, ChangeCharset. UTF_8, ChangeCharset. GBK); System. out. println (then convert the string of the UTF-8 code into GBK code: + gbk); System. out. println (); String utf16be = test. toUTF_16BE (str); System. out. println (convert to UTF-16BE code: + utf16be); gbk = test. changeCharset (utf16be, ChangeCharset. UTF_16BE, ChangeCharset. GBK); System. out. println (then convert the string of the UTF-16BE code into GBK code: + gbk); System. out. println (); String utf16le = test. toUTF_16LE (str); System. out. println (convert to UTF-16LE code: + utf16le); gbk = test. changeCharset (utf16le, ChangeCharset. UTF_16LE, ChangeCharset. GBK); System. out. println (then convert the string of the UTF-16LE code into GBK code: + gbk); System. out. println (); String utf16 = test. toUTF_16 (str); System. out. println (convert to UTF-16 code: + utf16); gbk = test. changeCharset (utf16, ChangeCharset. UTF_16LE, ChangeCharset. GBK); System. out. println (then convert the String of the UTF-16 code to GBK code: + gbk); String s = new String (Chinese. getBytes (UTF-8), UTF-8); System. out. println (s );}}


Bytes ------------------------------------------------------------------------------------------------------------------


The String class in java is encoded according to unicode. When String (byte [] bytes, String encoding) is used to construct a String, encoding refers to the encoding method of the data in bytes, rather than the encoding method of the final String. In other words, enables the system to convert bytes data from encoding to unicode encoding. If this parameter is not specified, jdk determines the bytes encoding method based on the operating system.

When we read data from a file, it is best to use the InputStream method, and then use String (byte [] bytes, String encoding) to specify the file encoding method. Do not use Reader because Reader automatically converts the file content to unicode encoding Based on the encoding method specified by jdk.

When we read text data from the database, we use the ResultSet. getBytes () method to obtain the byte array. We can also use the encoded string construction method.

ResultSet rs;
Bytep [] bytes = rs. getBytes ();
String str = new String (bytes, gb2312 );

Do not take the following steps.

ResultSet rs;
String str = rs. getString ();
Str = new String (str. getBytes (iso8859-1), gb2312 );

The efficiency of this encoding conversion method is low. The reason for doing so is that when the ResultSet is executed by the getString () method, the data encoding method in the database is iso8859-1 by default. The system converts the data to unicode according to the encoding method of the iso8859-1. Use str. getBytes (iso8859-1) to restore the data, and then use new String (bytes, gb2312) to convert the data from gb2312 to unicode, there are many more steps in the middle.

When reading parameters from HttpRequest, you can use the reqeust. setCharacterEncoding () method to set the encoding method. The read content is correct.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.