Introduced
java中使用Charset来表示编码对象
this class defines methods for Creating decoders and Encoders and for retrieving The various names associated with a charset. Instances of this class is immutable. This class also defines static Methods for Testing whether a particular charset is Supported, for locating CharSet instances by name, and for Constructing a map that contains every charset for which support was available in the current Java virtual machine.
CharSet Common static methods
Public Static Charset forname (String charsetname)// get Charset object public static SortedMap by encoding type <String,Charset> availablecharsets ()// get all encoding methods supported by the system publicstatic Charset defaultcharset ()// get virtual machine default encoding publicstaticBoolean issupported (String charsetname)// determine if the encoding type is supported
Common methods used in CharSet
Public Final String name ()// Gets the encoding type of the CharSet object (string)publicabstract Charsetencoder Newencoder ()// Get Encoder object publicabstract charsetdecoder newdecoder ()// get Decoder Object ... There are many other ways
CharSet Application Case List
Get all of the native supported encoding methods
Public void testgetavailablecharsets () { // get native All encoding format map<string, charset> charsets = charset.availablecharsets (); // Iterate through the encoding method for (entry<string, charset> entry:charsets.entrySet ()) { + ":" + Entry.getvalue (). name ()); c16/>}}
Get JVM Virtual Machine default encoding method
// Get JVM Default encoding method Charset Charset=charset.defaultcharset ();
Character encoding and decoding using encoders and decoders
public void Testencoderanddecoder () throws exception{ // Use Charset to encode and decode the charsetencoder encoder=charset.forname ("GBK" =charset.forname ("GBK" =encoder.encode (charbuffer.wrap ("China Code" .tochararray ())) ; Charbuffer charbuffer =decoder.decode (Bytebuffer); String string =charbuffer.tostring (); System.out.println (string);}
备注:写编码方式时候最好使用全大写字符比如:UTF-8、GBK。通常情况下大小写都能识别备注:java中关于字符编码问题,通常借助String构造方法或URLEncoder/URLDecoder,或则使用Charset的编码器和解码器。
Summarize
编码和解码问题是所有程序员都会面临的问题,尤其是对于非英语系国家的程序员更是如此。只有理解清楚字符编码原理才能在今后程序开发过程即使遇到编码问题也能够处之泰然。备注:备注计算机只能识别二进制数字,因此如果想要让计算机识别出自然语言文字自然就需要存在一张二进制数字和自然语言的映射表。每次编码和解码时都要查询这张映射表。
Example:
假如使用GBK码表对"中"字编码结果是183(十进制表示)如果你使用UTF-8码表来解码的话,查出来的183对应的是"国"字。结果就是错的了,所以编码和解码对应的码表一定要相同才能够解码正确(除非码表之间有包含关系,比如UTF-8已经包含了ASCII码表,那么解码就没问题)
Reference
1, https://en.wikipedia.org/wiki/Character_encoding
2, http://www.ruanyifeng.com/blog/2010/02/url_encoding.html
3, http://www.ruanyifeng.com/blog/2007/10/ascii_unicode_and_utf-8.html
4, http://polaris.blog.51cto.com/1146394/377468/
Java NiO's CharSet class character encoding object