To represent with a number
Not afraid to mention, in fact, the computer only understand the numbers. But the following may not be so obvious--because computers only understand numbers, they need to map numeric values to corresponding characters in some form, so that text can be displayed. This is the mapping (or character set) that allows the computer to understand the text. For example, for this mapping, the early desktops used ASCII. When a computer that uses ASCII stores numbers 72, 101, 108, and 112, it knows that the word "help" is displayed, because in ASCII, the number 72 is the value of H, 101 is the value of E, 108 is the value of L, and 112 is the value of P. But if this computer is an early IBM mainframe (it uses EBCDIC instead of ASCII), the word "help" will be represented by numbers 200, 133, 147, and 151.
Basic knowledge of character sets
When migrating to the Java language, there are three classes in the Java.nio.charset package that help with this mapping: CharSet, Charsetencoder, and Charsetdecoder. These classes work together so that you can take a mapping and then convert it to another mapping. When converting from another mapping to Java mapping (Unicode), you can use a decoder (decoder). Then, if you need to convert from Java Mapping (Unicode) to another mapping (or back to the original mapping), you can use the Encoder (encoder). You cannot convert directly between two non-Unicode formats with the Java.nio.charset package, but you can convert between the two non-Unicode formats in an intermediate Unicode format.
Before you can get a decoder or encoder, you need to get the Charset for a particular mapping. For example, Us-ascii is the name of the mapping for the 7-bit ASCII character set. You simply pass the name to the Charset forname () method as follows:
Charset charset =
Charset.forName("US-ASCII");
Once you have the Charset, simply request Charsetdecoder and Charsetencoder as follows:
CharsetDecoder decoder =
charset.newDecoder();
CharsetEncoder encoder =
charset.newEncoder();
With decoders and encoders, you can convert between different character sets, as follows:
ByteBuffer bytes = ...;
CharBuffer chars = decoder.decode(bytes);
bytes = encoder.encode(chars);
Of course, if you are unsure which character sets are available, you will need to use the following statement to ask:
SortedMap map =
Charset.availableCharsets();
You will then use a specific decoder to convert the external bytes to internal characters. Then, if you need to send the data to the Java code, you will use the encoder to convert the internal characters to external bytes. As to which specific character sets are available, your runtime will determine the entire character set. However, each Java programming implementation must support the following encodings:
Us-ascii:7 bit ASCII
Iso-8859-1:iso Latin alphabet
Utf-8:8 bit UCS conversion format
Utf-16be:16 bit UCS conversion format, large Mantissa method byte order
Utf-16le:16 bit UCS conversion format, small Mantissa method byte order
Utf-16:16 bit UCS conversion format, byte order identified by Mark (marker)
Then, different platforms may support additional character sets specific to the platform (for example, on a Windows platform, you will find that it supports the Windows-1252 character set). If you need to support other character sets, you can create your own character set. See the Charsetprovider API in the JAVA.NIO.CHARSET.SPI package.