Java Coded Character Set

Source: Internet
Author: User
Tags aliases

Java Coded Character Set

@author Ixenos

    1. 1. Character Set

a) the character set establishes a mapping between a sequence of two-byte Unicode symbols and a sequence of bytes using local character encoding.

b) in order to be compatible with other names, each character set has many aliases , and the aliases method of the CharSet object can return a set object composed of aliases

I. set<string> aliases = charset.aliases ();

II. for (String alias:aliases) {...}

III. you can use aliases to get Charset objects : Charset Charset =class.forname ("UTF-8")

C

 Jim Liu
Links: https://www.zhihu.com/question/50356029/answer/120608944
Source: Know
Copyright belongs to the author, please contact the author for authorization.

TL;DR
Character Set:A number of characters are included and indexed.
character encoding: A numbered index in a character set that uses the technical requirements and formats that the computer can handle (usually in bytes) to do the binary implementation.
----Split Line----
GB2312is a character set and is also a character encoding, which contains about thousands of characters.
GB18030Is the character set, which is also the encoding of theGB2312A huge extension that contains about 70,000 + characters.
The above two are GB
GBKIs Microsoft toGB2312The extension, compatibleGB2312(and GB18030 is not perfect compatible with GBK), GBK is not GB.
----Split Line----
UnicodeIs the "Universal Code" character set, and it is also a very simple code, but few programs will directly use this code.
More programs will useUnicode Transform Format/utfCoding, the more common isUTF-8AndUTF-16

----Split Line----
Why not support Unicode and build GB18030?
1.Unicode is used all over the world., is subject to ISO, and is very much compromised in English. It is not necessarily the most perfect coding scheme for Chinese characters. For example, UTF-8 encoding means most Chinese characters require 3 bytes, while GB18030 encoding most Chinese characters require only 2 bytes.
2, GB18030 is perfectly compatible with GB2312, which is very good for legacy system compatibility in Chinese environment. According to Wikipedia, Unicode only included Chinese characters in 1992, and the 1993 Unicode version of 1.1 included the same amount as GB18030, while GB2312 began to spread as early as 80. Not that we are not compatible with Unicode, but that Unicode is incompatible with us. For GB18030, he took on the role of extended GB2312, not simply making wheels to make the standard differentiation can be explained. Windows introduced the GBK in 1995 and it is perfectly compatible with GB2312. In that era, the Internet is not as well developed as today, according to different language environment to choose a cost-effective coding scheme, or understandable.

d) Local encoding mode cannot represent all Unicode characters, and if a character cannot be displayed, it will be converted to "? ".

e) once you have a character set , you can convert between a Java string (a Unicode code element) and a sequence of bytes (encoded)

I. coding (verb encode) Example of a Java string, which translates to a byte array, one or two or three or four bytes representing the character encoding (noun)

II. String str = "...";

III. Bytebuffer buffer = Charset.encode (str);//encode the string using the corresponding character set to return the Bytebuffer object

Iv. byte[] bytes = Buffer.array (); Remove the byte array from the object

V. To decode a byte sequence , you naturally need a byte buffer (Bytebuffer) object, using the static method of Bytebuffer wrap to convert a byte array into a buffer of a Bytebuffer object

vi. byte[] bytes = ...;

VII. Bytebuffer BBUF = bytebuffer.wrap (bytes,offset, length);

Viii. Charbuffer cbuf = Charset.decode (BBUF);//Return Charbuffer Object

IX. String str = cbuf.tostring ();

Extended reading :

Is it related to Unicode on GB18030 root?

Coding Crooked Biography--Basic article

Coding Crooked Biography--web

Code Crooked Biography--The external chapter

Java Coded Character Set

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.