The following program creates a string from a byte sequence, iteratively traverses characters in the string, and prints them as numbers. Describe the sequence of numbers printed by the Program:
public class StringCheese { public static void main(String[] args) { byte bytes[] = new byte[256]; for (int i = 0; i < 256; i++) bytes[i] = (byte)i; String str = new String(bytes); for (int i = 0, n = str.length(); i < n; i++) System.out.println((int)str.charAt(i) + " "); }}
First, the byte array is initialized with every possible byte value from 0 to 255. Then these byte values are converted to Char values through the string constructor. Finally, the char value is converted into an int value and printed. The printed value must be a non-negative integer. Because the char value is unsigned, you may expect the program to print an integer from 0 to 255 in order.
If you run this program, you may see this sequence. However, if you run the command once, you may not see this sequence. We run it on four machines and we will see four different sequences, including the sequence described above. This program cannot even be guaranteed to terminate normally. This guarantee is not provided for printing any other special string. Its behavior is completely uncertain.
The culprit here is the string (byte []) constructor. The standard description of the new string is as follows: "When a new string is constructed by decoding the specified byte array of the default Character Set of the platform, the length of the new string is a function of the character set. Therefore, it may not be equal to the length of the byte array. When not all the given bytes are valid in the default character set, the behavior of this constructor is uncertain "[Java-API].
What is character set? Technically speaking, it is a combination of the encoded character set and character encoding mode [Java-API]. In other words, a character set is a package that contains characters, numbers representing characters, and conversion between character encoding sequences and byte sequences. There is a big difference between character sets in the conversion mode: some are one-to-one ing between characters and bytes, but most are not. ISO-8859-1 is the only default character set that allows the program to print integers from 0 to 255 in order, and it is more well known as Latin-1 [ISO-8859-1].
The default Character Set of the j2se Runtime Environment (JRE) depends on the underlying operating system and language. If you want to know the default Character Set of your JRE and you are using version 5.0 or later, you can call Java. NIO. charset. charset. defaultcharset. If you are using an earlier version, you can read the system attribute "file. encoding.
Fortunately, you are not forced to tolerate all sorts of odd default character sets. When you convert the char and byte sequences, you can and generally should explicitly specify the character set. In addition to accepting byte numbers, the string constructor that can also accept a Character Set Name is designed specifically for this purpose. If you use the following constructor to replace the string constructor in the original program, no matter what the default character set is, this program ensures that the integers from 0 to 255 can be printed in order:
String str = new String(bytes, "ISO-8859-1");
This constructor declaration will throw the unsupportedencodingexception, so you must capture it, or the more appropriate way is to declare the main method to throw it, otherwise the program cannot be compiled. However, the program will not throw an exception. Charset specifications require that each Implementation of the Java platform must support certain types of character sets, and ISO-8859-1 is among them.
The lesson of this puzzle is that whenever you want to convert a byte sequence into a string, you are using a character set, whether or not you explicitly specify it. If you want to make the behavior of your program predictable, please specify it explicitly every time you use the character set. For API designers, it may not be a good idea to provide such a string (byte []) constructor dependent on the default character set.
Puzzle 18: string cheese